From sviswanathan at openjdk.org Wed May 1 00:17:54 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 1 May 2024 00:17:54 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v9] In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 23:54:19 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > fixes: pp bits in crc32, REX2 branch in ldmxcsr > It looks to me that the source and dest are reversed in the following instruction in call to simd_prefix_and_encode, perhaps that should be a separate PR: // Do we have this wrong src and dst reversed in simd_prefix_and_encode? void Assembler::pextrw(Register dst, XMMRegister src, int imm8) { assert(VM_Version::supports_sse2(), ""); InstructionAttr attributes(AVX_128bit, /* rex_w _/ false, /_ legacy_mode _/ _legacy_mode_bw, /_ no_mask_reg _/ true, /_ uses_vl */ false); int encode = simd_prefix_and_encode(as_XMMRegister(dst->encoding()), xnoreg, src, VEX_SIMD_66, VEX_OPCODE_0F, &attributes); emit_int24((unsigned char)0xC5, (0xC0 | encode), imm8); } Once that PR is fixed, is_src_gpr should be set to true for this one as well. Verified that the pextrw has the operands reversed per the SDM, so please ignore this comment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2087754604 From bkilambi at openjdk.org Wed May 1 08:51:58 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 1 May 2024 08:51:58 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v4] In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 10:15:02 GMT, Emanuel Peter wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Add comments, revert to requires_strict_order and other minor changes > > You probably want to change the name of the PR again: > `Add "is_associative" flag for floating-point add-reduction` -> `8320725: AArch64: C2: Add "requires_strict_order" flag for floating-point add-reduction` Hi @eme64 @theRealAph I have uploaded the latest patch addressing all review comments. Can I please ask for more reviews. Thank you .. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18034#issuecomment-2088168402 From epeter at openjdk.org Wed May 1 08:57:54 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 1 May 2024 08:57:54 GMT Subject: RFR: 8327381: Refactor type-improving transformations in BoolNode::Ideal to BoolNode::Value [v4] In-Reply-To: References: Message-ID: <1u7cg97KwHlBDapxCJCpEwzASwBgx1c2gINa_bHDG0w=.e8bb0c22-b490-406d-89d2-93027ab71277@github.com> On Tue, 30 Apr 2024 21:20:19 GMT, Martin Balao wrote: >> `(x & m) u< m + 1` is false for `m = -1`, right? >> >> Edit: Yep, filed [JDK-8328315](https://bugs.openjdk.org/projects/JDK/issues/JDK-8328315). > >> `(x & m) u< m + 1` is false for `m = -1`, right? >> > > This bug should be handled separately. I'll do that. @martinuy [JDK-8328315](https://bugs.openjdk.org/browse/JDK-8328315) @chhagedorn is already working on that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18198#issuecomment-2088173851 From sgibbons at openjdk.org Wed May 1 14:05:59 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 1 May 2024 14:05:59 GMT Subject: RFR: 8331033: EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 Message-ID: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> Added a strcmp for unsafe_setmemory in process_call_arguments() so the assert would not trigger. I believe this is the correct fix as I do not think the arguments for setMemory need special handling like arraycopy. I would like suggestions on how to generate a testcase to catch this type of error in mainline. ------------- Commit messages: - Add unsafe_setmemory comparison for process_call_arguments() Changes: https://git.openjdk.org/jdk/pull/19032/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19032&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331033 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19032.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19032/head:pull/19032 PR: https://git.openjdk.org/jdk/pull/19032 From dnsimon at openjdk.org Wed May 1 15:08:00 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 1 May 2024 15:08:00 GMT Subject: RFR: 8329982: compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/SimpleDebugInfoTest.java failed assert(oopDesc::is_oop_or_null(val)) failed: bad oop found Message-ID: This PR adds the missing nmethod entry barriers to JVMCI hand assembled tests. It also closes the escape hatch in jvmciCodeInstaller.cpp that allowed JVMCI code to be installed without nmethod entry barriers. ------------- Commit messages: - emit nmethod entry barriers in JVMCI assembler tests Changes: https://git.openjdk.org/jdk/pull/19035/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19035&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8329982 Stats: 302 lines in 6 files changed: 297 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19035.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19035/head:pull/19035 PR: https://git.openjdk.org/jdk/pull/19035 From duke at openjdk.org Wed May 1 17:46:29 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Wed, 1 May 2024 17:46:29 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v10] In-Reply-To: References: Message-ID: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: fix stmxcrs REX2 branch, add asserts to SHA instructions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/01241d48..54d2226f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=08-09 Stats: 8 lines in 1 file changed: 7 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From duke at openjdk.org Wed May 1 17:46:29 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Wed, 1 May 2024 17:46:29 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v9] In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 20:22:10 GMT, Sandhya Viswanathan wrote: > SHA instructions (sha1rnds4, sha1nexte, sha1msg1, sha1msg2, sha256rnds2, sha256msg1, sha256msg2) needs to be encoded using EVEX encoding when egprs are in use. Thank you, I missed these. The APX 3.0 spec says xmm register use is limited to 0-15 for SHA instructions. Coincidentally, the new version 4.0 APX spec. also removes support for EVEX promotion of SHA instructions. Given these specs, I don't think any encoding changes are needed. I've added an assert to these 7 instructions to check that only registers < 16 are used. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2088826920 From duke at openjdk.org Wed May 1 17:46:30 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Wed, 1 May 2024 17:46:30 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v7] In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 22:06:11 GMT, Sandhya Viswanathan wrote: >> Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: >> >> fix 4 more src_is_gpr = true cases, add asserts to check for UseAPX > > src/hotspot/cpu/x86/assembler_x86.cpp line 2632: > >> 2630: prefix(src, true /* is_map1 */); >> 2631: emit_int8((unsigned char)0xAE); >> 2632: emit_operand(as_Register(2), src, 0); > > Even when UseAVX > 0, if the src address uses higher bank registers, ldmxcsr/stmxcsr should be encoded using the REX2 i.e. the else path. Thank you, fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1586556997 From never at openjdk.org Wed May 1 17:46:57 2024 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 1 May 2024 17:46:57 GMT Subject: RFR: 8329982: compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/SimpleDebugInfoTest.java failed assert(oopDesc::is_oop_or_null(val)) failed: bad oop found In-Reply-To: References: Message-ID: On Wed, 1 May 2024 15:03:08 GMT, Doug Simon wrote: > This PR adds the missing nmethod entry barriers to JVMCI hand assembled tests. > It also closes the escape hatch in jvmciCodeInstaller.cpp that allowed JVMCI code to be installed without nmethod entry barriers. src/hotspot/share/jvmci/jvmciCodeInstaller.cpp line 777: > 775: // configurations which generate assembly without being a full compiler. So for now we enforce > 776: // that JIT compiled methods must have an nmethod barrier. > 777: bool install_default = JVMCIENV->get_HotSpotNmethod_isDefault(installed_code) != 0; This line is no longer needed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19035#discussion_r1586558212 From never at openjdk.org Wed May 1 17:52:55 2024 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 1 May 2024 17:52:55 GMT Subject: RFR: 8329982: compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/SimpleDebugInfoTest.java failed assert(oopDesc::is_oop_or_null(val)) failed: bad oop found In-Reply-To: References: Message-ID: On Wed, 1 May 2024 15:03:08 GMT, Doug Simon wrote: > This PR adds the missing nmethod entry barriers to JVMCI hand assembled tests. > It also closes the escape hatch in jvmciCodeInstaller.cpp that allowed JVMCI code to be installed without nmethod entry barriers. In the long term I'm not sure it's worth trying to maintain these assembler tests. The barrier verification code is very weak and on aarch64 it's slightly complicated so we're barely checking that it really matches. I guess this is good enough until we get further problems. I think you can simplify some other logic that deals with the optionality of the barrier. Start with removing JVMCINMethodData::has_entry_barrier and maybe update some of the comments to reflect that it's always emitted. And places that check _nmethod_entry_patch_offset for -1 can be removed or weakened. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19035#issuecomment-2088836198 From imyers at openjdk.org Wed May 1 17:59:02 2024 From: imyers at openjdk.org (Ian Myers) Date: Wed, 1 May 2024 17:59:02 GMT Subject: RFR: 8324756: Remove dependency verification from vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java Message-ID: This change removes dependency verification by passing -XX:-VerifyDependencies in the test. `vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java` takes 20min to run on linux-x86_64-server-fastdebug: time CONF=linux-x86_64-server-fastdebug make test TEST=vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java CONF=linux-x86_64-server-fastdebug make test **1412.82s user 15.27s system 115% cpu 20:41.19 total** Passing -XX:-VerifyDependencies flag speeds up the run time to 1min: time CONF=linux-x86_64-server-fastdebug make test TEST=vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java TEST_VM_OPTS="-XX:-VerifyDependencies" CONF=linux-x86_64-server-fastdebug make test **287.27s user 16.19s system 496% cpu 1:01.10 total** Adding -XX:-VerifyDependencies to the test file accomplishes the same run time of 1min: time CONF=linux-x86_64-server-fastdebug make test TEST=vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java CONF=linux-x86_64-server-fastdebug make test **272.33s user 14.56s system 464% cpu 1:01.75 total** ------------- Commit messages: - Remove dependency verification from vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java Changes: https://git.openjdk.org/jdk/pull/19040/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19040&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8324756 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19040.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19040/head:pull/19040 PR: https://git.openjdk.org/jdk/pull/19040 From kvn at openjdk.org Wed May 1 18:29:01 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 1 May 2024 18:29:01 GMT Subject: RFR: 8331033: EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 In-Reply-To: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> References: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> Message-ID: On Wed, 1 May 2024 14:01:38 GMT, Scott Gibbons wrote: > Added a strcmp for unsafe_setmemory in process_call_arguments() so the assert would not trigger. > > I believe this is the correct fix as I do not think the arguments for setMemory need special handling like arraycopy. > > I would like suggestions on how to generate a testcase to catch this type of error in mainline. `Unsafe.setMemory()` has `checkPrimitivePointer()` call which check that input is a primitive array or some address (`raw` address in EA terms). This check is done before intrinsic is called. Which means your fix is correct. It is similar to other intrinsics which operates on primitive arrays. The test could be locally allocated not-escaped array which is passed to `Unsafe.setMemory()` to be initialized to some value. ------------- PR Review: https://git.openjdk.org/jdk/pull/19032#pullrequestreview-2034177223 From kvn at openjdk.org Wed May 1 18:37:53 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 1 May 2024 18:37:53 GMT Subject: RFR: 8331033: EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 In-Reply-To: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> References: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> Message-ID: On Wed, 1 May 2024 14:01:38 GMT, Scott Gibbons wrote: > Added a strcmp for unsafe_setmemory in process_call_arguments() so the assert would not trigger. > > I believe this is the correct fix as I do not think the arguments for setMemory need special handling like arraycopy. > > I would like suggestions on how to generate a testcase to catch this type of error in mainline. Look on EA tests in `compiler/escapeAnalysis/` which use arraycopy(). Something like `TestMissingAntiDependency.java` or `TestSelfArrayCopy.java` ------------- PR Comment: https://git.openjdk.org/jdk/pull/19032#issuecomment-2088896952 From duke at openjdk.org Wed May 1 19:34:08 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Wed, 1 May 2024 19:34:08 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v11] In-Reply-To: References: Message-ID: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: remove is_map1 comment for addb, andb, movb, orb, testb, xchgb, xorb ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/54d2226f..c65fda0c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=09-10 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From dlong at openjdk.org Wed May 1 20:52:52 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 1 May 2024 20:52:52 GMT Subject: RFR: 8331253: 16 bits is not enough for nmethod::_skipped_instructions_size field In-Reply-To: References: Message-ID: On Wed, 1 May 2024 03:31:41 GMT, Vladimir Kozlov wrote: > In [JDK-8329433](https://bugs.openjdk.org/browse/JDK-8329433) I changed `nmethod::_skipped_instructions_size` field type to `uint16_t` assuming that it only count NOP instructions and GC barriers. I did not take into account that Generational ZGC also incudes barrier stubs into this size (original ZGC missed that). It is correct to include them because these stubs are generated in instructions section and not in stubs section: > > > Statistics for 1330 bytecoded nmethods for C2: > ... > ZGC: > main code = 3237080 (75.567032%) > stubs code = 810577 (25.040375%) > skipped insts = 44432 (1.372595%) > > GenZGC: > main code = 4034704 (78.238518%) > stubs code = 1356703 (33.625839%) > skipped insts = 1074611 (26.634197%) > > > Note, GenZGC has bigger code because it has store barriers. It generates a separate stub for each barrier, no sharing. > > After looking on how `_skipped_instructions_size` is used (only in one place when calculated inlinining size of compiled code) I decided replace it with `int _inline_insts_size;`. It is calculated the same way as before. > > And instead of including instructions stubs into `_skipped_instructions_size` I recorded size of instructions in code section before stubs are generated. This allow to get more accurate size of main instructions and no need for `InlineSkippedInstructionsCounter` in GC barriers stubs. > > I also fixed code in C2 which estimates size of code and stubs sections. > > Tested tier1-4,tier8,stress,xcomp Nice improvement. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19029#pullrequestreview-2034435239 From dnsimon at openjdk.org Wed May 1 20:57:14 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 1 May 2024 20:57:14 GMT Subject: RFR: 8329982: compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/SimpleDebugInfoTest.java failed assert(oopDesc::is_oop_or_null(val)) failed: bad oop found [v2] In-Reply-To: References: Message-ID: > This PR adds the missing nmethod entry barriers to JVMCI hand assembled tests. > It also closes the escape hatch in jvmciCodeInstaller.cpp that allowed JVMCI code to be installed without nmethod entry barriers. Doug Simon has updated the pull request incrementally with two additional commits since the last revision: - remove vestiges of optional JVMCI nmethod support for entry barriers - fixed failing tests and removed tests that install no longer valid code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19035/files - new: https://git.openjdk.org/jdk/pull/19035/files/62b3ad29..be4bf630 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19035&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19035&range=00-01 Stats: 426 lines in 12 files changed: 109 ins; 308 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/19035.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19035/head:pull/19035 PR: https://git.openjdk.org/jdk/pull/19035 From sviswanathan at openjdk.org Wed May 1 21:30:56 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 1 May 2024 21:30:56 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v9] In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 23:54:19 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > fixes: pp bits in crc32, REX2 branch in ldmxcsr src/hotspot/cpu/x86/assembler_x86.cpp line 2621: > 2619: > 2620: void Assembler::ldmxcsr( Address src) { > 2621: if (UseAVX > 0 && !needs_rex2(src.base(), src.index()) ) { When UseAPX is true, it is good to always use the SSE flavor of ldmxcsr/stmxcsr. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1585485391 From sviswanathan at openjdk.org Wed May 1 21:30:58 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 1 May 2024 21:30:58 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v8] In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 21:55:31 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > add egpr support for popcntq(R,A), cvttsd2siq(R,A), popq(R) src/hotspot/cpu/x86/assembler_x86.cpp line 14001: > 13999: emit_int8((unsigned char)0xF3); > 14000: prefixq(src, dst, true /* is_map1 */); > 14001: emit_int8((unsigned char)0xB8); Just a nit, this could be: emit_prefix_and_int8(get_prefixq(src, dst, true /* is_map1 */), (unsigned char) 0xB8); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1586826001 From sviswanathan at openjdk.org Wed May 1 21:30:55 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 1 May 2024 21:30:55 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v11] In-Reply-To: References: Message-ID: On Wed, 1 May 2024 19:34:08 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > remove is_map1 comment for addb, andb, movb, orb, testb, xchgb, xorb Last bit of comments, rest all looks good to me. Thanks a lot for your patience through my review. ------------- PR Review: https://git.openjdk.org/jdk/pull/18476#pullrequestreview-2032456092 From dnsimon at openjdk.org Wed May 1 21:01:53 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 1 May 2024 21:01:53 GMT Subject: RFR: 8329982: compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/SimpleDebugInfoTest.java failed assert(oopDesc::is_oop_or_null(val)) failed: bad oop found In-Reply-To: References: Message-ID: <9kDhf7fvKqxk5uw6xw4CAf1D_nl0fIRztONkbmMf1Q0=.1e9dd45c-3c6f-4d2a-9c21-4ae908e3285e@github.com> On Wed, 1 May 2024 17:49:53 GMT, Tom Rodriguez wrote: > In the long term I'm not sure it's worth trying to maintain these assembler tests. The barrier verification code is very weak and on aarch64 it's slightly complicated so we're barely checking that it really matches. I guess this is good enough until we get further problems. I agree. I had to push more changes now to remove tests that expect to be able to install 0 length code (which obviously fail the nmethod barrier verification). These tests provided stop gap coverage in the early days of JVMCI but now test functionality where breakage will clearly show up in higher layers (such as Graal). What's more, expanding the assembler support in JVMCI is redundant with the fully fledged assembler in Graal. > I think you can simplify some other logic that deals with the optionality of the barrier. Start with removing JVMCINMethodData::has_entry_barrier and maybe update some of the comments to reflect that it's always emitted. And places that check _nmethod_entry_patch_offset for -1 can be removed or weakened. I've done that now. Please let me know if you can spot any other vestiges of the optional support. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19035#issuecomment-2089131172 From kvn at openjdk.org Wed May 1 21:57:54 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 1 May 2024 21:57:54 GMT Subject: RFR: 8331253: 16 bits is not enough for nmethod::_skipped_instructions_size field In-Reply-To: References: Message-ID: <50kxjmEhsEt-y8L942zglgAMRw-F3IuqnDPVfCMc2Ns=.5ceea6e6-3929-4989-afe4-97fd3b0c74c9@github.com> On Wed, 1 May 2024 03:31:41 GMT, Vladimir Kozlov wrote: > In [JDK-8329433](https://bugs.openjdk.org/browse/JDK-8329433) I changed `nmethod::_skipped_instructions_size` field type to `uint16_t` assuming that it only count NOP instructions and GC barriers. I did not take into account that Generational ZGC also incudes barrier stubs into this size (original ZGC missed that). It is correct to include them because these stubs are generated in instructions section and not in stubs section: > > > Statistics for 1330 bytecoded nmethods for C2: > ... > ZGC: > main code = 3237080 (75.567032%) > stubs code = 810577 (25.040375%) > skipped insts = 44432 (1.372595%) > > GenZGC: > main code = 4034704 (78.238518%) > stubs code = 1356703 (33.625839%) > skipped insts = 1074611 (26.634197%) > > > Note, GenZGC has bigger code because it has store barriers. It generates a separate stub for each barrier, no sharing. > > After looking on how `_skipped_instructions_size` is used (only in one place when calculated inlinining size of compiled code) I decided replace it with `int _inline_insts_size;`. It is calculated the same way as before. > > And instead of including instructions stubs into `_skipped_instructions_size` I recorded size of instructions in code section before stubs are generated. This allow to get more accurate size of main instructions and no need for `InlineSkippedInstructionsCounter` in GC barriers stubs. > > I also fixed code in C2 which estimates size of code and stubs sections. > > Tested tier1-4,tier8,stress,xcomp Thank you, Dean. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19029#issuecomment-2089196917 From duke at openjdk.org Thu May 2 00:05:20 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Thu, 2 May 2024 00:05:20 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v12] In-Reply-To: References: Message-ID: <1g7DGTS-7SUhuXFL8NniTGAQSgskv-CdrwtOGHymZqk=.f2ea7538-1ef4-4f94-af4d-972d64e7f699@github.com> > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: simplification and fix asserts in ldmxcsr, stmxcsr, and emit_prefix_and_int8 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/c65fda0c..46eb6b42 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=10-11 Stats: 5 lines in 1 file changed: 1 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From duke at openjdk.org Thu May 2 00:05:20 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Thu, 2 May 2024 00:05:20 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v9] In-Reply-To: References: Message-ID: On Wed, 1 May 2024 00:15:28 GMT, Sandhya Viswanathan wrote: >> Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: >> >> fixes: pp bits in crc32, REX2 branch in ldmxcsr > >> It looks to me that the source and dest are reversed in the following instruction in call to simd_prefix_and_encode, perhaps that should be a separate PR: // Do we have this wrong src and dst reversed in simd_prefix_and_encode? void Assembler::pextrw(Register dst, XMMRegister src, int imm8) { assert(VM_Version::supports_sse2(), ""); InstructionAttr attributes(AVX_128bit, /* rex_w _/ false, /_ legacy_mode _/ _legacy_mode_bw, /_ no_mask_reg _/ true, /_ uses_vl */ false); int encode = simd_prefix_and_encode(as_XMMRegister(dst->encoding()), xnoreg, src, VEX_SIMD_66, VEX_OPCODE_0F, &attributes); emit_int24((unsigned char)0xC5, (0xC0 | encode), imm8); } Once that PR is fixed, is_src_gpr should be set to true for this one as well. > > Verified that the pextrw has the operands reversed per the SDM, so please ignore this comment. @sviswa7 Thank you for your review comments. Very helpful! > src/hotspot/cpu/x86/assembler_x86.cpp line 2621: > >> 2619: >> 2620: void Assembler::ldmxcsr( Address src) { >> 2621: if (UseAVX > 0 && !needs_rex2(src.base(), src.index()) ) { > > When UseAPX is true, it is good to always use the SSE flavor of ldmxcsr/stmxcsr. Thanks, modified assert in these two functions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2089312785 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1586933566 From duke at openjdk.org Thu May 2 00:05:21 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Thu, 2 May 2024 00:05:21 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v4] In-Reply-To: References: <_u8OUYZTsDfl7lzwoee3zewukw-yuFsn1_37Fn7iY5o=.2824d10d-30dd-4314-bae7-0beac0d79e2d@github.com> Message-ID: On Mon, 29 Apr 2024 21:52:14 GMT, Steve Dohrmann wrote: >> src/hotspot/cpu/x86/assembler_x86.cpp line 13260: >> >>> 13258: } else { >>> 13259: emit_int24((prefix & 0xFF00) >> 8, prefix & 0x00FF, b1); >>> 13260: } >> >> We need a check for UseAPX > 0 here. > > @sviswa7, sorry can you clarify what check is needed here. Thanks. Thanks, I understand now. Have added an assert to require UseAPX if prefix is WREX2. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1586933474 From duke at openjdk.org Thu May 2 00:05:21 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Thu, 2 May 2024 00:05:21 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v8] In-Reply-To: References: Message-ID: On Wed, 1 May 2024 21:04:50 GMT, Sandhya Viswanathan wrote: >> Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: >> >> add egpr support for popcntq(R,A), cvttsd2siq(R,A), popq(R) > > src/hotspot/cpu/x86/assembler_x86.cpp line 14001: > >> 13999: emit_int8((unsigned char)0xF3); >> 14000: prefixq(src, dst, true /* is_map1 */); >> 14001: emit_int8((unsigned char)0xB8); > > Just a nit, this could be: > emit_prefix_and_int8(get_prefixq(src, dst, true /* is_map1 */), (unsigned char) 0xB8); Thanks, done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1586933658 From dlong at openjdk.org Thu May 2 01:10:07 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 2 May 2024 01:10:07 GMT Subject: RFR: 8329982: compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/SimpleDebugInfoTest.java failed assert(oopDesc::is_oop_or_null(val)) failed: bad oop found [v2] In-Reply-To: References: Message-ID: On Wed, 1 May 2024 20:57:14 GMT, Doug Simon wrote: >> This PR adds the missing nmethod entry barriers to JVMCI hand assembled tests. >> It also closes the escape hatch in jvmciCodeInstaller.cpp that allowed JVMCI code to be installed without nmethod entry barriers. > > Doug Simon has updated the pull request incrementally with two additional commits since the last revision: > > - remove vestiges of optional JVMCI nmethod support for entry barriers > - fixed failing tests and removed tests that install no longer valid code Wouldn't it be useful for the JVMCI implementation to provide the nmethod entry barrier code? I could be wrong, but I think all the JIT compiler needs to know is how big it is, so it can reserve the space (NOPs would do), then when the code is installed as an nmethod, memcpy it over (if it's static), or use the MacroAssembler if it's not. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19035#issuecomment-2089363560 From szaldana at openjdk.org Thu May 2 01:43:59 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Thu, 2 May 2024 01:43:59 GMT Subject: Integrated: 8331088: Incorrect TraceLoopPredicate output In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 18:11:51 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR addresses [8331088](https://bugs.openjdk.org/browse/JDK-8331088) fixing the incorrect print output. > > Thanks, > Sonia This pull request has now been integrated. Changeset: 19e46eed Author: Sonia Zaldana Calles Committer: Dean Long URL: https://git.openjdk.org/jdk/commit/19e46eed580339a61fd1309c2cc7040e8c83597d Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8331088: Incorrect TraceLoopPredicate output Reviewed-by: chagedorn, dlong ------------- PR: https://git.openjdk.org/jdk/pull/19004 From thartmann at openjdk.org Thu May 2 06:02:14 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 2 May 2024 06:02:14 GMT Subject: RFR: 8331518: Tests should not use the Classpath exception form of the legal header Message-ID: <_3VjI3abxvxKuqUcaQsEsEGQ1WB2MuJlk3yWn7boJxI=.c8012113-6517-434b-9dc3-ab39df449f75@github.com> Removed the Classpath exception from the copyright header of some compiler tests and benchmarks. Thanks, Tobias ------------- Commit messages: - 8331518: Tests should not use the Classpath exception form of the legal header Changes: https://git.openjdk.org/jdk/pull/19047/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19047&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331518 Stats: 15 lines in 5 files changed: 0 ins; 10 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/19047.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19047/head:pull/19047 PR: https://git.openjdk.org/jdk/pull/19047 From rcastanedalo at openjdk.org Thu May 2 06:19:54 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 2 May 2024 06:19:54 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic In-Reply-To: References: Message-ID: <6_WS0VtL8jPuB2U9R8rh8lccVcU_IXMU6AOzaIu48lA=.9934e5e8-0ac8-44c9-9505-3cee953515aa@github.com> On Tue, 30 Apr 2024 21:34:56 GMT, Martin Doerr wrote: >> This changeset generalizes the logic to analyze, declare, and communicate which registers are live at a C2 barrier stub so that it can be used by other collectors than ZGC adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). >> >> The main changes are: >> >> - Make it possible to compute register liveness information before (live-in) or after (live-out) each barrier, and let the collector choose by implementing `BarrierSetC2State::needs_livein_data()`. >> >> - Generalize the interface with which collectors declare which registers must be additionally preserved across barrier runtime calls, adding the methods `BarrierStubC2::preserve(Register r)` and `BarrierStubC2::dont_preserve(Register r)`. >> >> - Simplify the interface with which platform-specific logic computes which registers to preserve across barrier runtime calls, replacing the calls to `BarrierStubC2::result()` and `BarrierStubC2::live()` with a single call to `BarrierStubC2::preserve_set()`. >> >> #### Testing >> >> - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> - tier1-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/4d4e743d8f4cddd5288cee1d69c70fe2b9bea066) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. >> - Build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug). @RealFYang, @TheRealMDoerr: could you please test and review the riscv and ppc changes? Thanks! > > src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 127: > >> 125: while (OptoReg::is_reg(reg)) { >> 126: const VMReg vm_reg = OptoReg::as_VMReg(reg); >> 127: if (!(vm_reg->is_Register()) || vm_reg->as_Register() != r) { > > This doesn't work on PPC64: We run into "assert(is_Register() && is_even(value())) failed: even-aligned GPR name" (vmreg_ppc.hpp:54). Calling `as_Register()` is only supported for the even ones. > Maybe add check `is_concrete()`? Thank you Martin for trying out the patch and for the suggestion, will test a solution based on `is_concrete()` and push the changes later, if it works. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19026#discussion_r1587114886 From roland at openjdk.org Thu May 2 06:58:12 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 2 May 2024 06:58:12 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v15] In-Reply-To: References: <3fcIOnZHYI7ebFLr6vUGnMCo7GDnQ-FTDNjKTeoXqNA=.99b678a0-d04c-4c0d-a269-d0fc41104bfc@github.com> Message-ID: On Thu, 18 Apr 2024 10:16:55 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: >> >> - Merge branch 'master' into JDK-8320649 >> - review >> - test fix >> - test fix >> - Merge branch 'master' into JDK-8320649 >> - whitespaces >> - review >> - Merge branch 'master' into JDK-8320649 >> - review >> - 32 bit build fix >> - ... and 12 more: https://git.openjdk.org/jdk/compare/bfff02ee...a4ffc11e > > test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 2: > >> 1: /* >> 2: * Copyright (c) 2024, Red Hat, Inc. All rights reserved. > > I like the tests, there is a lot of material here. > > A few more ideas: > - have two scoped values, and then have a sequence of `get` and `getValue` calls on them, in some random mix. And check that everything gets commoned, and the result is correct. > - have a method that directly uses `get`, but also has inner scopes of `where`/`get`. Interleave these, maybe even with multiple different scoped values. And nest them with various depths. And then verify both the expected number of calls / loads, as well as the result. > > Also: is it possible to stuff ScopedValues into ScopedValues? That would be another interesting stress-test with lots of options. In the commit that I will push soon, I added more tests: a couple with 3 scoped values and a few with ScopedValues into ScopedValues. The ones you suggest with has inner scopes of `where/get ` can't work because `Cache.invalidate()` would then be called: when C2 sees a call to `Cache.invalidate()` , it doesn't perform any optimization. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1587151951 From thartmann at openjdk.org Thu May 2 07:07:54 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 2 May 2024 07:07:54 GMT Subject: RFR: 8331253: 16 bits is not enough for nmethod::_skipped_instructions_size field In-Reply-To: References: Message-ID: On Wed, 1 May 2024 03:31:41 GMT, Vladimir Kozlov wrote: > In [JDK-8329433](https://bugs.openjdk.org/browse/JDK-8329433) I changed `nmethod::_skipped_instructions_size` field type to `uint16_t` assuming that it only count NOP instructions and GC barriers. I did not take into account that Generational ZGC also incudes barrier stubs into this size (original ZGC missed that). It is correct to include them because these stubs are generated in instructions section and not in stubs section: > > > Statistics for 1330 bytecoded nmethods for C2: > ... > ZGC: > main code = 3237080 (75.567032%) > stubs code = 810577 (25.040375%) > skipped insts = 44432 (1.372595%) > > GenZGC: > main code = 4034704 (78.238518%) > stubs code = 1356703 (33.625839%) > skipped insts = 1074611 (26.634197%) > > > Note, GenZGC has bigger code because it has store barriers. It generates a separate stub for each barrier, no sharing. > > After looking on how `_skipped_instructions_size` is used (only in one place when calculated inlinining size of compiled code) I decided replace it with `int _inline_insts_size;`. It is calculated the same way as before. > > And instead of including instructions stubs into `_skipped_instructions_size` I recorded size of instructions in code section before stubs are generated. This allow to get more accurate size of main instructions and no need for `InlineSkippedInstructionsCounter` in GC barriers stubs. > > I also fixed code in C2 which estimates size of code and stubs sections. > > Tested tier1-4,tier8,stress,xcomp Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19029#pullrequestreview-2034982296 From roland at openjdk.org Thu May 2 07:10:59 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 2 May 2024 07:10:59 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v15] In-Reply-To: <6-Yl6oBb-GdMyxY9DdqLcJbkZGxjItUvr2xHF3rFYk0=.2fd21903-990c-4d60-9ff8-0606506ba86d@github.com> References: <3fcIOnZHYI7ebFLr6vUGnMCo7GDnQ-FTDNjKTeoXqNA=.99b678a0-d04c-4c0d-a269-d0fc41104bfc@github.com> <6-Yl6oBb-GdMyxY9DdqLcJbkZGxjItUvr2xHF3rFYk0=.2fd21903-990c-4d60-9ff8-0606506ba86d@github.com> Message-ID: On Thu, 18 Apr 2024 12:22:27 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/loopopts.cpp line 3783: >> >>> 3781: // ScopedValueGetLoadFromCache and companion ScopedValueGetHitsInCacheNode must stay together >>> 3782: move_scoped_value_nodes_to_not_peel(peel, not_peel, peel_list, sink_list, i); >>> 3783: incr = false; >> >> Do we not have to increment the `cloned_for_outside_use`, which affects the `estimate`? > > Could we otherwise exhaust the node limit, by peeling a loop that is too large? No node is cloned here so there's no need to adjust the `estimate`. What happens is that a `ScopedValueGetHitsInCacheNode` is in the peeled region of the loop but not its `ScopedValueGetLoadFromCache` because peeling happens right above the `If` for the `ScopedValueGetHitsInCacheNode` . It's correct to simply move the `ScopedValueGetHitsInCacheNode` out of the peeled region into the non peeled region because it's only used there. There was no test case for this. I added one. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1587164252 From roland at openjdk.org Thu May 2 07:18:05 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 2 May 2024 07:18:05 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v15] In-Reply-To: <6-Yl6oBb-GdMyxY9DdqLcJbkZGxjItUvr2xHF3rFYk0=.2fd21903-990c-4d60-9ff8-0606506ba86d@github.com> References: <3fcIOnZHYI7ebFLr6vUGnMCo7GDnQ-FTDNjKTeoXqNA=.99b678a0-d04c-4c0d-a269-d0fc41104bfc@github.com> <6-Yl6oBb-GdMyxY9DdqLcJbkZGxjItUvr2xHF3rFYk0=.2fd21903-990c-4d60-9ff8-0606506ba86d@github.com> Message-ID: On Thu, 18 Apr 2024 11:45:07 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/loopopts.cpp line 4010: >> >>> 4008: peel.remove(hits_in_cache->_idx); >>> 4009: not_peel.set(hits_in_cache->_idx); >>> 4010: peel_list.remove(i); >> >> Looks like duplicated code from the call-site. A refactoring may help. > > I think you could combine the code with the case: > `if (n->in(0) == nullptr && !n->is_Load() && !n->is_CMove()) {` > And then you would have this code here, as well as the `TracePartialPeeling` code shared for both. I moved that code to a helper method so it's shared. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1587169416 From stuefe at openjdk.org Thu May 2 07:18:12 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 2 May 2024 07:18:12 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v7] In-Reply-To: References: Message-ID: > See [1] for previous discussions. > > We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. > > The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. > > Examples: > > This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` > > This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` > > > --- > > The patch: > > 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. > 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. > 3) Adapted and extended tests > > I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. > > > Tested: > > - manually on Mac m1 (debug and release) > - GHAs are running > - but Oracle will do more testing before this goes in > > [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Remove accidental change to TestDeadPhiMergeMemLoop.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18969/files - new: https://git.openjdk.org/jdk/pull/18969/files/5a460a1f..691a1467 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18969&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18969&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18969/head:pull/18969 PR: https://git.openjdk.org/jdk/pull/18969 From roland at openjdk.org Thu May 2 07:18:04 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 2 May 2024 07:18:04 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v15] In-Reply-To: References: <3fcIOnZHYI7ebFLr6vUGnMCo7GDnQ-FTDNjKTeoXqNA=.99b678a0-d04c-4c0d-a269-d0fc41104bfc@github.com> Message-ID: On Thu, 18 Apr 2024 12:47:54 GMT, Emanuel Peter wrote: > I am wondering if it would make sense to have some `scoped_value.hpp/cpp`, where you can put all your new classes. This would also allow you to put documentation about the general approach at the top of the `scoped_value.hpp` file. Currently, the code is spread all over, and it would be hard to know where one could find a good summary of the whole optimization. I moved most of the scoped value specific code to `scoped_value.hpp/cpp` in the new commit. > src/hotspot/share/opto/loopnode.hpp line 703: > >> 701: bool policy_peeling(PhaseIdealLoop* phase, bool scoped_value_only); >> 702: >> 703: uint estimate_peeling(PhaseIdealLoop* phase, bool peel_only_if_has_scoped_value); > > Can we use the same name for `scoped_value_only` and `peel_only_if_has_scoped_value`? In `policy_peeling` you pass the value into `estimate_peeling`, so it seems to be the same. > > Somehow it does not sit well with me that we have such a special-case flag in such a high-level and general method. But I don't know a fix now. It just looks like not the best design. But that may not be your fault. Are there any alternatives? I added a `policy_peeling_for_scoped_value()` method. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2089774427 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1587170022 From roland at openjdk.org Thu May 2 07:20:59 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 2 May 2024 07:20:59 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v16] In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 13:09:22 GMT, Roland Westrelin wrote: >> This change implements C2 optimizations for calls to >> ScopedValue.get(). Indeed, in: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> `v2` can be replaced by `v1` and the second call to `get()` can be >> optimized out. That's true whatever is between the 2 calls unless a >> new mapping for `scopedValue` is created in between (when that happens >> no optimizations is performed for the method being compiled). Hoisting >> a `get()` call out of loop for a loop invariant `scopedValue` should >> also be legal in most cases. >> >> `ScopedValue.get()` is implemented in java code as a 2 step process. A >> cache is attached to the current thread object. If the `ScopedValue` >> object is in the cache then the result from `get()` is read from >> there. Otherwise a slow call is performed that also inserts the >> mapping in the cache. The cache itself is lazily allocated. One >> `ScopedValue` can be hashed to 2 different indexes in the cache. On a >> cache probe, both indexes are checked. As a consequence, the process >> of probing the cache is a multi step process (check if the cache is >> present, check first index, check second index if first index >> failed). If the cache is populated early on, then when the method that >> calls `ScopedValue.get()` is compiled, profile reports the slow path >> as never taken and only the read from the cache is compiled. >> >> To perform the optimizations, I added 3 new node types to C2: >> >> - the pair >> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for >> the cache probe >> >> - a cfg node ScopedValueGetResultNode to help locate the result of the >> `get()` call in the IR graph. >> >> In pseudo code, once the nodes are inserted, the code of a `get()` is: >> >> >> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) >> if (hits_in_the_cache) { >> res = ScopedValueGetLoadFromCache(hits_in_the_cache); >> } else { >> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex >> } >> res = ScopedValueGetResult(res) >> >> >> In the snippet: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> Replacing `v2` by `v1` is then done by starting from the >> `ScopedValueGetResult` node for the second `get()` and looking for a >> dominating `ScopedValueGetResult` for the same `ScopedValue` >> object. When one is found, it is used as a replacement. Eliminating >> the second `get()` call is achieved by making >> `ScopedValueGetHitsInCache` always successful if there's a dominating >> `Scoped... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Emanuel Peter I also removed `Node::find_unique_out_with()` and replaced it with `Node* find_out_with(int opcode, bool want_unique = false)`. I'll look into the automatic casting but I'd like to possibly do it as a separate clean up. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2089779578 From roland at openjdk.org Thu May 2 07:29:44 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 2 May 2024 07:29:44 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v17] In-Reply-To: References: Message-ID: > This change implements C2 optimizations for calls to > ScopedValue.get(). Indeed, in: > > > v1 = scopedValue.get(); > ... > v2 = scopedValue.get(); > > > `v2` can be replaced by `v1` and the second call to `get()` can be > optimized out. That's true whatever is between the 2 calls unless a > new mapping for `scopedValue` is created in between (when that happens > no optimizations is performed for the method being compiled). Hoisting > a `get()` call out of loop for a loop invariant `scopedValue` should > also be legal in most cases. > > `ScopedValue.get()` is implemented in java code as a 2 step process. A > cache is attached to the current thread object. If the `ScopedValue` > object is in the cache then the result from `get()` is read from > there. Otherwise a slow call is performed that also inserts the > mapping in the cache. The cache itself is lazily allocated. One > `ScopedValue` can be hashed to 2 different indexes in the cache. On a > cache probe, both indexes are checked. As a consequence, the process > of probing the cache is a multi step process (check if the cache is > present, check first index, check second index if first index > failed). If the cache is populated early on, then when the method that > calls `ScopedValue.get()` is compiled, profile reports the slow path > as never taken and only the read from the cache is compiled. > > To perform the optimizations, I added 3 new node types to C2: > > - the pair > ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for > the cache probe > > - a cfg node ScopedValueGetResultNode to help locate the result of the > `get()` call in the IR graph. > > In pseudo code, once the nodes are inserted, the code of a `get()` is: > > > hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) > if (hits_in_the_cache) { > res = ScopedValueGetLoadFromCache(hits_in_the_cache); > } else { > res = ..; //slow call possibly inlined. Subgraph can be arbitray complex > } > res = ScopedValueGetResult(res) > > > In the snippet: > > > v1 = scopedValue.get(); > ... > v2 = scopedValue.get(); > > > Replacing `v2` by `v1` is then done by starting from the > `ScopedValueGetResult` node for the second `get()` and looking for a > dominating `ScopedValueGetResult` for the same `ScopedValue` > object. When one is found, it is used as a replacement. Eliminating > the second `get()` call is achieved by making > `ScopedValueGetHitsInCache` always successful if there's a dominating > `ScopedValueGetResult` and replacing its companion > `ScopedValueGetLoadFromCache` by the dominating > `ScopedValueGetResult`. > > Hoisting a `g... Roland Westrelin has updated the pull request incrementally with four additional commits since the last revision: - more - more tests - scoped_value.[ch]pp - review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16966/files - new: https://git.openjdk.org/jdk/pull/16966/files/f63bf543..d38872fd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16966&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16966&range=15-16 Stats: 5196 lines in 28 files changed: 2735 ins; 2322 del; 139 mod Patch: https://git.openjdk.org/jdk/pull/16966.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16966/head:pull/16966 PR: https://git.openjdk.org/jdk/pull/16966 From roland at openjdk.org Thu May 2 07:31:58 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 2 May 2024 07:31:58 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v13] In-Reply-To: References: <9Eoh8hOSSVvAtf9iVQ6hflQyceUtt4dpZdqm61zg5XI=.358a4d79-70d9-4b54-85d5-37c6817f0fae@github.com> Message-ID: On Mon, 29 Apr 2024 23:02:33 GMT, Dean Long wrote: >> src/hotspot/share/c1/c1_GraphBuilder.cpp line 2030: >> >>> 2028: receiver = state()->stack_at(index); >>> 2029: ciType* type = receiver->exact_type(); >>> 2030: if (type != nullptr && type->is_loaded()) { >> >> Is it the case that we can't see an interface here? Or that we think it's ok if we see an interface here? > > We can't see an interface here because it will get rejected by `ciInstanceKlass::exact_klass`, so we could even assert for that here if we wanted. Then, I think we should add an assert that `!type->as_instance_klass()->is_interface()` and also that it's not and array of interfaces (using `base_element_klass()`) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1587185774 From tholenstein at openjdk.org Thu May 2 07:37:55 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 2 May 2024 07:37:55 GMT Subject: RFR: 8331404: IGV: Show line numbers for callees in properties In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 15:38:20 GMT, Christian Hagedorn wrote: > IGV shows the `bci` for a node in the callee, followed by the bci in the caller method and so on until we reach the root method. For the `line` property, we currently only show the line number found in the root method (`first()` is the root method being compiled and `second()` and `third()` are inlined): > > Example program: > ![image](https://github.com/openjdk/jdk/assets/17833009/579fe9eb-4bd8-42d8-9d03-875f25bd97ae) > > Properties of the store to `fFld`: > ![image](https://github.com/openjdk/jdk/assets/17833009/3763cccf-c1ba-4d7f-a986-eae8bf0654b0) > > One could read the line number from the `jvms` property above. But you would need to expand that property with the button on the right side which opens a window. But then you cannot click anything else anymore in IGV until you close the window again. > > A simpler and easier to read solution is to add the line number information to match the bci numbers (they are printed in callee->root method order which I think is okay - especially if there are a lot of inlinees, it could be easier to have the really interesting numbers at the start on the left side). This would look something like that: > ![image](https://github.com/openjdk/jdk/assets/17833009/fcab3af6-69ac-43ae-89be-19fc4476d12f) > > If there is no line number information for a bci, I simply emit a `_`. > > Testing: > - Manual testing in IGV > - Sanity testing by running `java -Xcomp -XX:+PrintIdealGraph -XX:PrintIdealGraphLevel=4 -XX:PrintIdealGraphFile=graph.xml HelloWorld.java`. > > Thanks, > Christian Looks good! ------------- Marked as reviewed by tholenstein (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19025#pullrequestreview-2035030650 From dholmes at openjdk.org Thu May 2 07:44:55 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 2 May 2024 07:44:55 GMT Subject: RFR: 8331518: Tests should not use the "Classpath" exception form of the legal header In-Reply-To: <_3VjI3abxvxKuqUcaQsEsEGQ1WB2MuJlk3yWn7boJxI=.c8012113-6517-434b-9dc3-ab39df449f75@github.com> References: <_3VjI3abxvxKuqUcaQsEsEGQ1WB2MuJlk3yWn7boJxI=.c8012113-6517-434b-9dc3-ab39df449f75@github.com> Message-ID: On Thu, 2 May 2024 05:57:50 GMT, Tobias Hartmann wrote: > Removed the Classpath exception from the copyright header of some compiler tests and benchmarks. > > Thanks, > Tobias LGTM. Thanks. I'd consider this a trivial fix too. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19047#pullrequestreview-2035042618 From thartmann at openjdk.org Thu May 2 07:51:55 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 2 May 2024 07:51:55 GMT Subject: RFR: 8331518: Tests should not use the "Classpath" exception form of the legal header In-Reply-To: <_3VjI3abxvxKuqUcaQsEsEGQ1WB2MuJlk3yWn7boJxI=.c8012113-6517-434b-9dc3-ab39df449f75@github.com> References: <_3VjI3abxvxKuqUcaQsEsEGQ1WB2MuJlk3yWn7boJxI=.c8012113-6517-434b-9dc3-ab39df449f75@github.com> Message-ID: On Thu, 2 May 2024 05:57:50 GMT, Tobias Hartmann wrote: > Removed the Classpath exception from the copyright header of some compiler tests and benchmarks. > > Thanks, > Tobias Thanks for the review David! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19047#issuecomment-2089828156 From thartmann at openjdk.org Thu May 2 07:51:55 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 2 May 2024 07:51:55 GMT Subject: Integrated: 8331518: Tests should not use the "Classpath" exception form of the legal header In-Reply-To: <_3VjI3abxvxKuqUcaQsEsEGQ1WB2MuJlk3yWn7boJxI=.c8012113-6517-434b-9dc3-ab39df449f75@github.com> References: <_3VjI3abxvxKuqUcaQsEsEGQ1WB2MuJlk3yWn7boJxI=.c8012113-6517-434b-9dc3-ab39df449f75@github.com> Message-ID: <5mrHBQlLsVmlnl8hMSirNOfnBy71QLlF7ajun-SCbFU=.ba0e51c2-4643-4cbc-bf13-cbd9d3a8a2e3@github.com> On Thu, 2 May 2024 05:57:50 GMT, Tobias Hartmann wrote: > Removed the Classpath exception from the copyright header of some compiler tests and benchmarks. > > Thanks, > Tobias This pull request has now been integrated. Changeset: d3bf5262 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/d3bf52628efb79e1b98749d628c4b6d035e1d511 Stats: 15 lines in 5 files changed: 0 ins; 10 del; 5 mod 8331518: Tests should not use the "Classpath" exception form of the legal header Reviewed-by: dholmes ------------- PR: https://git.openjdk.org/jdk/pull/19047 From rcastanedalo at openjdk.org Thu May 2 07:57:18 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 2 May 2024 07:57:18 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic [v2] In-Reply-To: References: Message-ID: > This changeset generalizes the logic to analyze, declare, and communicate which registers are live at a C2 barrier stub so that it can be used by other collectors than ZGC adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). > > The main changes are: > > - Make it possible to compute register liveness information before (live-in) or after (live-out) each barrier, and let the collector choose by implementing `BarrierSetC2State::needs_livein_data()`. > > - Generalize the interface with which collectors declare which registers must be additionally preserved across barrier runtime calls, adding the methods `BarrierStubC2::preserve(Register r)` and `BarrierStubC2::dont_preserve(Register r)`. > > - Simplify the interface with which platform-specific logic computes which registers to preserve across barrier runtime calls, replacing the calls to `BarrierStubC2::result()` and `BarrierStubC2::live()` with a single call to `BarrierStubC2::preserve_set()`. > > #### Testing > > - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). > - tier1-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/4d4e743d8f4cddd5288cee1d69c70fe2b9bea066) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. > - Build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug). @RealFYang, @TheRealMDoerr: could you please test and review the riscv and ppc changes? Thanks! Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Use VMReg::is_concrete for testing sub-registers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19026/files - new: https://git.openjdk.org/jdk/pull/19026/files/31a19a48..c0fc66de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19026&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19026&range=00-01 Stats: 13 lines in 1 file changed: 0 ins; 6 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/19026.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19026/head:pull/19026 PR: https://git.openjdk.org/jdk/pull/19026 From rcastanedalo at openjdk.org Thu May 2 07:57:18 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 2 May 2024 07:57:18 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic [v2] In-Reply-To: References: Message-ID: <3mTCG75Z1f2KDjDAIhnkxKDwbEK2Q4LvF5T5tJ0vWBQ=.274b69df-ef47-4750-a916-0d25ec8b65fa@github.com> On Tue, 30 Apr 2024 21:34:56 GMT, Martin Doerr wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Use VMReg::is_concrete for testing sub-registers > > src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 127: > >> 125: while (OptoReg::is_reg(reg)) { >> 126: const VMReg vm_reg = OptoReg::as_VMReg(reg); >> 127: if (!(vm_reg->is_Register()) || vm_reg->as_Register() != r) { > > This doesn't work on PPC64: We run into "assert(is_Register() && is_even(value())) failed: even-aligned GPR name" (vmreg_ppc.hpp:54). Calling `as_Register()` is only supported for the even ones. > Maybe add check `is_concrete()`? Done (commit https://github.com/openjdk/jdk/pull/19026/commits/c0fc66deb654a9b930a7b7cf1a7e7fa093739027). @TheRealMDoerr please let me know if this works on PPC64. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19026#discussion_r1587216311 From aph at openjdk.org Thu May 2 08:14:59 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 2 May 2024 08:14:59 GMT Subject: RFR: 8329258: TailCall should not use frame pointer register for jump target [v4] In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 10:31:09 GMT, Tobias Hartmann wrote: >> Applying @danielogh's patch (see description of [JDK-8329258](https://bugs.openjdk.org/browse/JDK-8329258)) to enable `StressGCM` / `StressLCM` for stub compilations triggers a crash on AArch64. The problem is that the register allocator uses `R29` (`rfp`) which is usually used for the frame pointer to hold the `TailCall` `exc_target` when generating the `_slow_arraycopy_Java` stub: >> https://github.com/openjdk/jdk/blob/6dfb8120c270a76fcba5a5c3c9ad91da3282d5fa/src/hotspot/share/opto/generateOptoStub.cpp#L258-L264 >> >> With `StressGCM` / `StressLCM` the register initialization is scheduled early and `R29` is corrupted by the `MachEpilogNode` which is opaque to the register allocator and inserted right before the `TailCall`: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/share/opto/output.cpp#L320-L326 >> >> >> 028 mov R29, 0x0000ffff78cc0080 # ptr >> >> [...] >> >> 098 # pop frame 16 >> ldp lr, rfp, [sp,#0] <- Epilog kills rfp (and lr + sp) >> add sp, sp, #16 >> >> [...] >> >> 0a0 br R29 # R12 holds method >> >> >> As a result, we jump to a "garbage" location. See [bad code](https://bugs.openjdk.org/secure/attachment/108835/BAD_slow_arraycopy_Java.txt) vs. [good code](https://bugs.openjdk.org/secure/attachment/108834/GOOD_slow_arraycopy_Java.txt). >> >> On x86, we use `no_rbp_RegP` instead: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L2564-L2566 https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L12470 >> >> I implemented the same on AArch64. >> >> I think other platforms are affected as well but I don't have the hardware to test there. @offamitkumar (S390), @TheRealMDoerr (PPC), @RealFYang (RISC-V), @bulasevich (ARM32), could you please have a look? >> >> I also wondered if `R29` shouldn't be a callee-save (SOE) register in the C calling convention? >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/aarch64/aarch64.ad#L139-L140 On x86_64, `RBP` is SOE: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L86-L87 >> >> `TestTailCallInArrayCopyStub.java` will only work once [JDK-8330016](https://bugs.openjdk.org/browse/JDK-8330016) is integrated. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Comment adjustment Thanks. Sorry for my slow reply. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18716#pullrequestreview-2035101941 From chagedorn at openjdk.org Thu May 2 08:15:12 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 2 May 2024 08:15:12 GMT Subject: RFR: 8305638: Renaming and small clean-ups around predicates [v6] In-Reply-To: References: Message-ID: > **Update: April 22** > > After splitting off and integrating the following PRs from this PR: > https://github.com/openjdk/jdk/pull/18080 > https://github.com/openjdk/jdk/pull/18293 > https://github.com/openjdk/jdk/pull/18628 > https://github.com/openjdk/jdk/pull/18723 > > we are only left with a few renaming and clean-ups from this PR. Directly merging the master branch in was quite hard. I therefore reverted all commits to get back to a clean master and then applied all remaining code changes manually (required a force push). > >
>
> > _------------ Original PR description --------------_ > > This patch is intended for JDK 23. > > While preparing the patch for the full fix for Assertion Predicates [JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981), I still noticed that some changes are not required for the actual fix and could be split off and reviewed separately in this PR. > > The patch applies the following cleanup changes: > - The complete fix had to add slightly different cloning cases in `PhaseIdealLoop::create_bool_from_template_assertion_predicate()` which already has quite some logic to switch between different cases. Additionally, the algorithm in the method itself was already hard to understand and difficult to adapt. I therefore re-implemented it in a separate class `CloneTemplateAssertionPredicateBool` together with some helper classes like `DFSNodeStack`. To use it, I've added a `TemplateAssertionPredicateBool` class that offers three cloning possibilities: > - `clone()`: Clone without modification > - `clone_and_replace_opaque_loop_nodes()`: Clone and replace the `OpaqueLoop*Nodes` with a new init and stride node. > - `clone_and_replace_init()`: Special case of `clone_and_replace_opaque_loop_nodes()` which only replaces `OpaqueLoopInitNode` and clones `OpaqueLoopStrideNode`. > > This refactoring could be extracted from the complete fix. > - The Split If code to detect (`subgraph_has_opaque()`) and clone Template Assertion Predicate Bools was extracted to a separate class `CloneTemplateAssertionPredicateBoolDown` and uses the new `TemplateAssertionPredicateBool` class to do the actual cloning. > - In the process of coding the complete fix, I've refactored the Loop Unswitching code quite a bit. This change could also be extracted into a separate RFE. Changes include: > - Renaming > - Extracting code to separate classes/methods > - Adding comments > - Some small refactoring including: > - Removing unused parameters > - Renaming variables/parameters/methods > > Th... Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Merge branch 'refs/heads/master' into JDK-8305638 # Conflicts: # src/hotspot/share/opto/loopPredicate.cpp - Fix useful Template Assertion Predicate marking - Fix useful Parse Predicate marking - Remaining renaming and small clean-ups ------------- Changes: https://git.openjdk.org/jdk/pull/16877/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16877&range=05 Stats: 77 lines in 5 files changed: 17 ins; 7 del; 53 mod Patch: https://git.openjdk.org/jdk/pull/16877.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16877/head:pull/16877 PR: https://git.openjdk.org/jdk/pull/16877 From chagedorn at openjdk.org Thu May 2 08:16:53 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 2 May 2024 08:16:53 GMT Subject: RFR: 8331404: IGV: Show line numbers for callees in properties In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 15:38:20 GMT, Christian Hagedorn wrote: > IGV shows the `bci` for a node in the callee, followed by the bci in the caller method and so on until we reach the root method. For the `line` property, we currently only show the line number found in the root method (`first()` is the root method being compiled and `second()` and `third()` are inlined): > > Example program: > ![image](https://github.com/openjdk/jdk/assets/17833009/579fe9eb-4bd8-42d8-9d03-875f25bd97ae) > > Properties of the store to `fFld`: > ![image](https://github.com/openjdk/jdk/assets/17833009/3763cccf-c1ba-4d7f-a986-eae8bf0654b0) > > One could read the line number from the `jvms` property above. But you would need to expand that property with the button on the right side which opens a window. But then you cannot click anything else anymore in IGV until you close the window again. > > A simpler and easier to read solution is to add the line number information to match the bci numbers (they are printed in callee->root method order which I think is okay - especially if there are a lot of inlinees, it could be easier to have the really interesting numbers at the start on the left side). This would look something like that: > ![image](https://github.com/openjdk/jdk/assets/17833009/fcab3af6-69ac-43ae-89be-19fc4476d12f) > > If there is no line number information for a bci, I simply emit a `_`. > > Testing: > - Manual testing in IGV > - Sanity testing by running `java -Xcomp -XX:+PrintIdealGraph -XX:PrintIdealGraphLevel=4 -XX:PrintIdealGraphFile=graph.xml HelloWorld.java`. > > Thanks, > Christian Thanks Toby for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19025#issuecomment-2089867892 From bulasevich at openjdk.org Thu May 2 08:19:53 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 2 May 2024 08:19:53 GMT Subject: RFR: 8330806: test/hotspot/jtreg/compiler/c1/TestLargeMonitorOffset.java fails on ARM32 In-Reply-To: <0TiKLBlllAunug0vnrED5etz2Asg0faInPkxw2qebE8=.327bf508-f675-4b1a-8d65-866cae772234@github.com> References: <0TiKLBlllAunug0vnrED5etz2Asg0faInPkxw2qebE8=.327bf508-f675-4b1a-8d65-866cae772234@github.com> Message-ID: <7MFw7690WXwQ0vF53EPK04vMLVavhkIfTtdGHvk3gcI=.27b7b975-ce50-4c92-bcb7-7e4ae189e293@github.com> On Fri, 26 Apr 2024 15:22:25 GMT, Sergey Nazarkin wrote: >> TestLargeMonitorOffset was introduced by 8310844 with a fix for the AArch64 platform. The same issue needs to be fixed for ARM32. With this change, we add the large slot_offset handling to the ARM32 version of IR_Assembler::osr_entry(). >> >> Testing: jtreg hotspot, jtreg jdk tier1-3. > > src/hotspot/cpu/arm/c1_LIRAssembler_arm.cpp line 156: > >> 154: int slot_offset = monitor_offset - (i * 2 * BytesPerWord); >> 155: if (slot_offset >= 4096 - BytesPerWord) { >> 156: __ add_slow(R2, OSR_buf, slot_offset); > > Can't we check this once before the loop? Or does such an optimization make no sense? Hi Sergey. Thanks for looking at this. This is not performance critical code, and the typical number_of_locks value is 0, so IF inside the FOR loop makes sense here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18891#discussion_r1587242241 From dlong at openjdk.org Thu May 2 08:26:59 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 2 May 2024 08:26:59 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v13] In-Reply-To: References: <9Eoh8hOSSVvAtf9iVQ6hflQyceUtt4dpZdqm61zg5XI=.358a4d79-70d9-4b54-85d5-37c6817f0fae@github.com> Message-ID: <_x-OSownzQQZ8fmlsbvQ42MLf9BGZskECTNncOE0s4E=.8381a076-0cc4-4339-924f-fa22ca780573@github.com> On Thu, 2 May 2024 07:29:04 GMT, Roland Westrelin wrote: >> We can't see an interface here because it will get rejected by `ciInstanceKlass::exact_klass`, so we could even assert for that here if we wanted. > > Then, I think we should add an assert that `!type->as_instance_klass()->is_interface()` and also that it's not and array of interfaces (using `base_element_klass()`) An array of interfaces can be exact: new Interface[20].getClasss(); and it seems like it would be safe to allow this, so I think we only need one assert for `!type->as_instance_klass()->is_interface()` if we don't trust the result of exact_type(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1587252944 From dnsimon at openjdk.org Thu May 2 09:34:51 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 2 May 2024 09:34:51 GMT Subject: RFR: 8329982: compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/SimpleDebugInfoTest.java failed assert(oopDesc::is_oop_or_null(val)) failed: bad oop found [v2] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 01:06:48 GMT, Dean Long wrote: > Wouldn't it be useful for the JVMCI implementation to provide the nmethod entry barrier code? I could be wrong, but I think all the JIT compiler needs to know is how big it is, so it can reserve the space (NOPs would do), then when the code is installed as an nmethod, memcpy it over (if it's static), or use the MacroAssembler if it's not. That's an interesting idea and would be great if possible. However, given that Graal [puts the slow path out-of-line](https://github.com/oracle/graal/blob/c0b79318e2158a22bec5a9a991ee6ee226de6492/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/hotspot/amd64/AMD64HotSpotBackend.java#L195), we'd be stuck with the problem of patching in the jump target. Also, JVMCI would have to conservatively emit a long-form jump instruction to the slow path. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19035#issuecomment-2090014082 From lucy at openjdk.org Thu May 2 09:42:54 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Thu, 2 May 2024 09:42:54 GMT Subject: RFR: 8331421: ubsan: vmreg.cpp checking error member call on misaligned address In-Reply-To: <-0CT3e78TSiBvMrwImLSJDFJkQ7k7BwcMhfoW5tKklA=.aab953cb-9fb3-4b89-acf4-ae6967276c0b@github.com> References: <-0CT3e78TSiBvMrwImLSJDFJkQ7k7BwcMhfoW5tKklA=.aab953cb-9fb3-4b89-acf4-ae6967276c0b@github.com> Message-ID: On Tue, 30 Apr 2024 13:56:07 GMT, Martin Doerr wrote: > As shown in the JBS issue, the Undefined Behavior Sanitizer complains about `VMRegImpl::stack_0()->value()`. This can easily be avoided by skipping the more complicated way which includes addition and subtraction of `first()`. LGTM. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19022#pullrequestreview-2035282281 From mdoerr at openjdk.org Thu May 2 09:42:55 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 2 May 2024 09:42:55 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic [v2] In-Reply-To: <3mTCG75Z1f2KDjDAIhnkxKDwbEK2Q4LvF5T5tJ0vWBQ=.274b69df-ef47-4750-a916-0d25ec8b65fa@github.com> References: <3mTCG75Z1f2KDjDAIhnkxKDwbEK2Q4LvF5T5tJ0vWBQ=.274b69df-ef47-4750-a916-0d25ec8b65fa@github.com> Message-ID: On Thu, 2 May 2024 07:54:16 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 127: >> >>> 125: while (OptoReg::is_reg(reg)) { >>> 126: const VMReg vm_reg = OptoReg::as_VMReg(reg); >>> 127: if (!(vm_reg->is_Register()) || vm_reg->as_Register() != r) { >> >> This doesn't work on PPC64: We run into "assert(is_Register() && is_even(value())) failed: even-aligned GPR name" (vmreg_ppc.hpp:54). Calling `as_Register()` is only supported for the even ones. >> Maybe add check `is_concrete()`? > > Done (commit https://github.com/openjdk/jdk/pull/19026/commits/c0fc66deb654a9b930a7b7cf1a7e7fa093739027). @TheRealMDoerr please let me know if this works on PPC64. Yes, this works on PPC64. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19026#discussion_r1587348967 From mdoerr at openjdk.org Thu May 2 10:23:57 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 2 May 2024 10:23:57 GMT Subject: RFR: 8331421: ubsan: vmreg.cpp checking error member call on misaligned address In-Reply-To: <-0CT3e78TSiBvMrwImLSJDFJkQ7k7BwcMhfoW5tKklA=.aab953cb-9fb3-4b89-acf4-ae6967276c0b@github.com> References: <-0CT3e78TSiBvMrwImLSJDFJkQ7k7BwcMhfoW5tKklA=.aab953cb-9fb3-4b89-acf4-ae6967276c0b@github.com> Message-ID: On Tue, 30 Apr 2024 13:56:07 GMT, Martin Doerr wrote: > As shown in the JBS issue, the Undefined Behavior Sanitizer complains about `VMRegImpl::stack_0()->value()`. This can easily be avoided by skipping the more complicated way which includes addition and subtraction of `first()`. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19022#issuecomment-2090106095 From mdoerr at openjdk.org Thu May 2 10:23:58 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 2 May 2024 10:23:58 GMT Subject: Integrated: 8331421: ubsan: vmreg.cpp checking error member call on misaligned address In-Reply-To: <-0CT3e78TSiBvMrwImLSJDFJkQ7k7BwcMhfoW5tKklA=.aab953cb-9fb3-4b89-acf4-ae6967276c0b@github.com> References: <-0CT3e78TSiBvMrwImLSJDFJkQ7k7BwcMhfoW5tKklA=.aab953cb-9fb3-4b89-acf4-ae6967276c0b@github.com> Message-ID: On Tue, 30 Apr 2024 13:56:07 GMT, Martin Doerr wrote: > As shown in the JBS issue, the Undefined Behavior Sanitizer complains about `VMRegImpl::stack_0()->value()`. This can easily be avoided by skipping the more complicated way which includes addition and subtraction of `first()`. This pull request has now been integrated. Changeset: beebce04 Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/beebce044db97e50a7aea3f83d70e134b2128d0a Stats: 4 lines in 2 files changed: 1 ins; 0 del; 3 mod 8331421: ubsan: vmreg.cpp checking error member call on misaligned address Reviewed-by: mbaesken, lucy ------------- PR: https://git.openjdk.org/jdk/pull/19022 From chagedorn at openjdk.org Thu May 2 10:40:08 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 2 May 2024 10:40:08 GMT Subject: RFR: 8305638: Renaming and small clean-ups around predicates [v7] In-Reply-To: References: Message-ID: <-UU0jrN33Dxbp9EJ9u1FSJ2RDYC02JMK84gnzZLUhSg=.0e20361b-81a2-4ae9-a320-70f3cd9804c6@github.com> > **Update: April 22** > > After splitting off and integrating the following PRs from this PR: > https://github.com/openjdk/jdk/pull/18080 > https://github.com/openjdk/jdk/pull/18293 > https://github.com/openjdk/jdk/pull/18628 > https://github.com/openjdk/jdk/pull/18723 > > we are only left with a few renaming and clean-ups from this PR. Directly merging the master branch in was quite hard. I therefore reverted all commits to get back to a clean master and then applied all remaining code changes manually (required a force push). > >
>
> > _------------ Original PR description --------------_ > > This patch is intended for JDK 23. > > While preparing the patch for the full fix for Assertion Predicates [JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981), I still noticed that some changes are not required for the actual fix and could be split off and reviewed separately in this PR. > > The patch applies the following cleanup changes: > - The complete fix had to add slightly different cloning cases in `PhaseIdealLoop::create_bool_from_template_assertion_predicate()` which already has quite some logic to switch between different cases. Additionally, the algorithm in the method itself was already hard to understand and difficult to adapt. I therefore re-implemented it in a separate class `CloneTemplateAssertionPredicateBool` together with some helper classes like `DFSNodeStack`. To use it, I've added a `TemplateAssertionPredicateBool` class that offers three cloning possibilities: > - `clone()`: Clone without modification > - `clone_and_replace_opaque_loop_nodes()`: Clone and replace the `OpaqueLoop*Nodes` with a new init and stride node. > - `clone_and_replace_init()`: Special case of `clone_and_replace_opaque_loop_nodes()` which only replaces `OpaqueLoopInitNode` and clones `OpaqueLoopStrideNode`. > > This refactoring could be extracted from the complete fix. > - The Split If code to detect (`subgraph_has_opaque()`) and clone Template Assertion Predicate Bools was extracted to a separate class `CloneTemplateAssertionPredicateBoolDown` and uses the new `TemplateAssertionPredicateBool` class to do the actual cloning. > - In the process of coding the complete fix, I've refactored the Loop Unswitching code quite a bit. This change could also be extracted into a separate RFE. Changes include: > - Renaming > - Extracting code to separate classes/methods > - Adding comments > - Some small refactoring including: > - Removing unused parameters > - Renaming variables/parameters/methods > > Th... Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' into JDK-8305638 - Merge branch 'refs/heads/master' into JDK-8305638 # Conflicts: # src/hotspot/share/opto/loopPredicate.cpp - Fix useful Template Assertion Predicate marking - Fix useful Parse Predicate marking - Remaining renaming and small clean-ups ------------- Changes: https://git.openjdk.org/jdk/pull/16877/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16877&range=06 Stats: 77 lines in 5 files changed: 17 ins; 7 del; 53 mod Patch: https://git.openjdk.org/jdk/pull/16877.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16877/head:pull/16877 PR: https://git.openjdk.org/jdk/pull/16877 From asotona at openjdk.org Thu May 2 11:08:16 2024 From: asotona at openjdk.org (Adam Sotona) Date: Thu, 2 May 2024 11:08:16 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v4] In-Reply-To: References: Message-ID: > Hi, > During performance optimization work on Class-File API as JDK lambda generator we found some static initialization killers. > One of them is `java.lang.classfile.Attributes` with tens of static fields initialized with individual attribute mappers, and common set of all mappers, and static map from attribute names to the mappers. > > I propose to turn all the static fields into lazy-initialized static methods and remove `PREDEFINED_ATTRIBUTES` and `standardAttribute(Utf8Entry name)` static mapping method from the `Attributes` API class. > > Please let me know your comments or objections and please review the [PR](https://github.com/openjdk/jdk/pull/19006) and [CSR](https://bugs.openjdk.org/browse/JDK-8331414), so we can make it into 23. > > Thank you, > Adam Adam Sotona has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'master' into JDK-8331291-attributes - changed order in allowed modules attributes check - added bug number - added impl comment - removed list of predefined attributes standard attributes mapping hard-coded and moved to BoundAttribute added AttributesTest::testAttributesMapping - move mappers implementations to AbstractAttributeMapper - 8331291: java.lang.classfile.Attributes class performs a lot of static initializations ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19006/files - new: https://git.openjdk.org/jdk/pull/19006/files/f0d9174e..fd8da774 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19006&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19006&range=02-03 Stats: 4061 lines in 236 files changed: 1910 ins; 657 del; 1494 mod Patch: https://git.openjdk.org/jdk/pull/19006.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19006/head:pull/19006 PR: https://git.openjdk.org/jdk/pull/19006 From thartmann at openjdk.org Thu May 2 11:41:00 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 2 May 2024 11:41:00 GMT Subject: RFR: 8329258: TailCall should not use frame pointer register for jump target [v4] In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 10:31:09 GMT, Tobias Hartmann wrote: >> Applying @danielogh's patch (see description of [JDK-8329258](https://bugs.openjdk.org/browse/JDK-8329258)) to enable `StressGCM` / `StressLCM` for stub compilations triggers a crash on AArch64. The problem is that the register allocator uses `R29` (`rfp`) which is usually used for the frame pointer to hold the `TailCall` `exc_target` when generating the `_slow_arraycopy_Java` stub: >> https://github.com/openjdk/jdk/blob/6dfb8120c270a76fcba5a5c3c9ad91da3282d5fa/src/hotspot/share/opto/generateOptoStub.cpp#L258-L264 >> >> With `StressGCM` / `StressLCM` the register initialization is scheduled early and `R29` is corrupted by the `MachEpilogNode` which is opaque to the register allocator and inserted right before the `TailCall`: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/share/opto/output.cpp#L320-L326 >> >> >> 028 mov R29, 0x0000ffff78cc0080 # ptr >> >> [...] >> >> 098 # pop frame 16 >> ldp lr, rfp, [sp,#0] <- Epilog kills rfp (and lr + sp) >> add sp, sp, #16 >> >> [...] >> >> 0a0 br R29 # R12 holds method >> >> >> As a result, we jump to a "garbage" location. See [bad code](https://bugs.openjdk.org/secure/attachment/108835/BAD_slow_arraycopy_Java.txt) vs. [good code](https://bugs.openjdk.org/secure/attachment/108834/GOOD_slow_arraycopy_Java.txt). >> >> On x86, we use `no_rbp_RegP` instead: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L2564-L2566 https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L12470 >> >> I implemented the same on AArch64. >> >> I think other platforms are affected as well but I don't have the hardware to test there. @offamitkumar (S390), @TheRealMDoerr (PPC), @RealFYang (RISC-V), @bulasevich (ARM32), could you please have a look? >> >> I also wondered if `R29` shouldn't be a callee-save (SOE) register in the C calling convention? >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/aarch64/aarch64.ad#L139-L140 On x86_64, `RBP` is SOE: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L86-L87 >> >> `TestTailCallInArrayCopyStub.java` will only work once [JDK-8330016](https://bugs.openjdk.org/browse/JDK-8330016) is integrated. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Comment adjustment Thanks for the review, Andrew! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18716#issuecomment-2090281840 From thartmann at openjdk.org Thu May 2 11:41:02 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 2 May 2024 11:41:02 GMT Subject: Integrated: 8329258: TailCall should not use frame pointer register for jump target In-Reply-To: References: Message-ID: On Wed, 10 Apr 2024 12:34:11 GMT, Tobias Hartmann wrote: > Applying @danielogh's patch (see description of [JDK-8329258](https://bugs.openjdk.org/browse/JDK-8329258)) to enable `StressGCM` / `StressLCM` for stub compilations triggers a crash on AArch64. The problem is that the register allocator uses `R29` (`rfp`) which is usually used for the frame pointer to hold the `TailCall` `exc_target` when generating the `_slow_arraycopy_Java` stub: > https://github.com/openjdk/jdk/blob/6dfb8120c270a76fcba5a5c3c9ad91da3282d5fa/src/hotspot/share/opto/generateOptoStub.cpp#L258-L264 > > With `StressGCM` / `StressLCM` the register initialization is scheduled early and `R29` is corrupted by the `MachEpilogNode` which is opaque to the register allocator and inserted right before the `TailCall`: > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/share/opto/output.cpp#L320-L326 > > > 028 mov R29, 0x0000ffff78cc0080 # ptr > > [...] > > 098 # pop frame 16 > ldp lr, rfp, [sp,#0] <- Epilog kills rfp (and lr + sp) > add sp, sp, #16 > > [...] > > 0a0 br R29 # R12 holds method > > > As a result, we jump to a "garbage" location. See [bad code](https://bugs.openjdk.org/secure/attachment/108835/BAD_slow_arraycopy_Java.txt) vs. [good code](https://bugs.openjdk.org/secure/attachment/108834/GOOD_slow_arraycopy_Java.txt). > > On x86, we use `no_rbp_RegP` instead: > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L2564-L2566 https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L12470 > > I implemented the same on AArch64. > > I think other platforms are affected as well but I don't have the hardware to test there. @offamitkumar (S390), @TheRealMDoerr (PPC), @RealFYang (RISC-V), @bulasevich (ARM32), could you please have a look? > > I also wondered if `R29` shouldn't be a callee-save (SOE) register in the C calling convention? > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/aarch64/aarch64.ad#L139-L140 On x86_64, `RBP` is SOE: > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L86-L87 > > `TestTailCallInArrayCopyStub.java` will only work once [JDK-8330016](https://bugs.openjdk.org/browse/JDK-8330016) is integrated. > > Thanks, > Tobias This pull request has now been integrated. Changeset: cccc9535 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/cccc95358d5c38cbcabc7f79abc53674deb1e6d8 Stats: 117 lines in 5 files changed: 113 ins; 0 del; 4 mod 8329258: TailCall should not use frame pointer register for jump target Co-authored-by: Fei Yang Reviewed-by: rcastanedalo, aph ------------- PR: https://git.openjdk.org/jdk/pull/18716 From thartmann at openjdk.org Thu May 2 12:28:52 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 2 May 2024 12:28:52 GMT Subject: RFR: 8331404: IGV: Show line numbers for callees in properties In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 15:38:20 GMT, Christian Hagedorn wrote: > IGV shows the `bci` for a node in the callee, followed by the bci in the caller method and so on until we reach the root method. For the `line` property, we currently only show the line number found in the root method (`first()` is the root method being compiled and `second()` and `third()` are inlined): > > Example program: > ![image](https://github.com/openjdk/jdk/assets/17833009/579fe9eb-4bd8-42d8-9d03-875f25bd97ae) > > Properties of the store to `fFld`: > ![image](https://github.com/openjdk/jdk/assets/17833009/3763cccf-c1ba-4d7f-a986-eae8bf0654b0) > > One could read the line number from the `jvms` property above. But you would need to expand that property with the button on the right side which opens a window. But then you cannot click anything else anymore in IGV until you close the window again. > > A simpler and easier to read solution is to add the line number information to match the bci numbers (they are printed in callee->root method order which I think is okay - especially if there are a lot of inlinees, it could be easier to have the really interesting numbers at the start on the left side). This would look something like that: > ![image](https://github.com/openjdk/jdk/assets/17833009/fcab3af6-69ac-43ae-89be-19fc4476d12f) > > If there is no line number information for a bci, I simply emit a `_`. > > Testing: > - Manual testing in IGV > - Sanity testing by running `java -Xcomp -XX:+PrintIdealGraph -XX:PrintIdealGraphLevel=4 -XX:PrintIdealGraphFile=graph.xml HelloWorld.java`. > > Thanks, > Christian Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19025#pullrequestreview-2035600862 From rcastanedalo at openjdk.org Thu May 2 12:37:53 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 2 May 2024 12:37:53 GMT Subject: RFR: 8331253: 16 bits is not enough for nmethod::_skipped_instructions_size field In-Reply-To: References: Message-ID: On Wed, 1 May 2024 03:31:41 GMT, Vladimir Kozlov wrote: > In [JDK-8329433](https://bugs.openjdk.org/browse/JDK-8329433) I changed `nmethod::_skipped_instructions_size` field type to `uint16_t` assuming that it only count NOP instructions and GC barriers. I did not take into account that Generational ZGC also incudes barrier stubs into this size (original ZGC missed that). It is correct to include them because these stubs are generated in instructions section and not in stubs section: > > > Statistics for 1330 bytecoded nmethods for C2: > ... > ZGC: > main code = 3237080 (75.567032%) > stubs code = 810577 (25.040375%) > skipped insts = 44432 (1.372595%) > > GenZGC: > main code = 4034704 (78.238518%) > stubs code = 1356703 (33.625839%) > skipped insts = 1074611 (26.634197%) > > > Note, GenZGC has bigger code because it has store barriers. It generates a separate stub for each barrier, no sharing. > > After looking on how `_skipped_instructions_size` is used (only in one place when calculated inlinining size of compiled code) I decided replace it with `int _inline_insts_size;`. It is calculated the same way as before. > > And instead of including instructions stubs into `_skipped_instructions_size` I recorded size of instructions in code section before stubs are generated. This allow to get more accurate size of main instructions and no need for `InlineSkippedInstructionsCounter` in GC barriers stubs. > > I also fixed code in C2 which estimates size of code and stubs sections. > > Tested tier1-4,tier8,stress,xcomp Thanks for working on this, Vladimir! I tried out this changeset on a simple example ([example-and-instrumentation.zip](https://github.com/openjdk/jdk/files/15188249/example-and-instrumentation.zip)) using a JVM instrumented with the attached patch to observe the output of `ciMethod::inline_instructions_size()` and this seems to differ before and after the changeset: Before: caller: Test foo (LTest$MyObject;)Ljava/lang/Object; inline instructions size: 0 callee: Test bar (LTest$MyObject;)V inline instructions size: 219 after: caller: Test foo (LTest$MyObject;)Ljava/lang/Object; inline instructions size: 0 callee: Test bar (LTest$MyObject;)V inline instructions size: 183 Is this deviation expected? If so, I suggest to split this changeset into a simple bug fix that only widens the type of `nmethod::_skipped_instructions_size` without affecting the inlining heuristic, and a RFE with the remaining changes. ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19029#pullrequestreview-2035622983 From imyers at openjdk.org Thu May 2 12:50:16 2024 From: imyers at openjdk.org (Ian Myers) Date: Thu, 2 May 2024 12:50:16 GMT Subject: RFR: 8324756: Test vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize is too slow due to dependency verification [v2] In-Reply-To: References: Message-ID: > This change removes dependency verification by passing -XX:-VerifyDependencies in the test. > > `vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java` takes 20min to run on linux-x86_64-server-fastdebug: > > time CONF=linux-x86_64-server-fastdebug make test TEST=vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java > CONF=linux-x86_64-server-fastdebug make test **1412.82s user 15.27s system 115% cpu 20:41.19 total** > > > Passing -XX:-VerifyDependencies flag speeds up the run time to 1min: > > time CONF=linux-x86_64-server-fastdebug make test TEST=vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java TEST_VM_OPTS="-XX:-VerifyDependencies" > CONF=linux-x86_64-server-fastdebug make test **287.27s user 16.19s system 496% cpu 1:01.10 total** > > > Adding -XX:-VerifyDependencies to the test file accomplishes the same run time of 1min: > > time CONF=linux-x86_64-server-fastdebug make test TEST=vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java > CONF=linux-x86_64-server-fastdebug make test **272.33s user 14.56s system 464% cpu 1:01.75 total** Ian Myers has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: [8324756] Remove dependency verification from vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19040/files - new: https://git.openjdk.org/jdk/pull/19040/files/b5944f4e..99314e02 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19040&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19040&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19040.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19040/head:pull/19040 PR: https://git.openjdk.org/jdk/pull/19040 From shade at openjdk.org Thu May 2 12:59:54 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 2 May 2024 12:59:54 GMT Subject: RFR: 8324756: Test vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize is too slow due to dependency verification [v2] In-Reply-To: References: Message-ID: <-ig7Zj830qvQ91e_kbIRRfOn_8Pm23qxFOxUdGsSSWk=.9a40c696-9c91-4729-916d-61965099e0ae@github.com> On Thu, 2 May 2024 12:50:16 GMT, Ian Myers wrote: >> This change removes dependency verification by passing -XX:-VerifyDependencies in the test. >> >> `vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java` takes 20min to run on linux-x86_64-server-fastdebug: >> >> time CONF=linux-x86_64-server-fastdebug make test TEST=vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java >> CONF=linux-x86_64-server-fastdebug make test **1412.82s user 15.27s system 115% cpu 20:41.19 total** >> >> >> Passing -XX:-VerifyDependencies flag speeds up the run time to 1min: >> >> time CONF=linux-x86_64-server-fastdebug make test TEST=vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java TEST_VM_OPTS="-XX:-VerifyDependencies" >> CONF=linux-x86_64-server-fastdebug make test **287.27s user 16.19s system 496% cpu 1:01.10 total** >> >> >> Adding -XX:-VerifyDependencies to the test file accomplishes the same run time of 1min: >> >> time CONF=linux-x86_64-server-fastdebug make test TEST=vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java >> CONF=linux-x86_64-server-fastdebug make test **272.33s user 14.56s system 464% cpu 1:01.75 total** > > Ian Myers has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > [8324756] Remove dependency verification from vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java I think you want to add the reversal of https://github.com/openjdk/jdk/commit/2564f0f99866c33d14947609c276a421ce8cc0a2 to this PR as well. I am not sure we want to run the test with disabled dependency verification, though. It is a compiler test, so we would like to have compiler checking code online as much as possible. Have you explored if this is an issue with Sweeper removal, and if so, if adding GCs help? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19040#issuecomment-2090438866 From asmehra at openjdk.org Thu May 2 13:14:52 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Thu, 2 May 2024 13:14:52 GMT Subject: RFR: 8330813: Don't call methods from Compressed(Oops|Klass) if the associated mode is inactive In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 11:04:07 GMT, Thomas Stuefe wrote: > We should not call methods from CompressedOops if we run with -XX:-UseCompressedOops, and the same goes for CompressedKlass and -XX:-UseCompressedClassPointers. (the latter we do assert in Lilliput). Marked as reviewed by asmehra (Committer). lgtm ------------- PR Review: https://git.openjdk.org/jdk/pull/18883#pullrequestreview-2035709003 PR Comment: https://git.openjdk.org/jdk/pull/18883#issuecomment-2090466879 From asmehra at openjdk.org Thu May 2 13:38:52 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Thu, 2 May 2024 13:38:52 GMT Subject: RFR: 8331344: No compiler replay file with CompilerCommand MemLimit In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 18:39:33 GMT, Thomas Stuefe wrote: > When using the compiler memory limit with the crash suboption (e.g. `-XX:CompileCommand=MemLimit,*.*,1g~crash`), the JVM asserts but may fail to produce a replay file. We also may see partly corrupted hs-err files. > > This happens if the memory limit hit was caused by growing ResourceAreas, not the C2 node arena. We also use ResourceArea when producing the replay file. > > If those RA usages cause another Arena chunk to be allocated, we re-enter `CompilationMemoryStatistic::on_arena_change` recursively, possibly multiple times. This will at least prevent replay file generation, but also may abort error handling altogether if a stack overflow happens. > > The patch prevents that recursion. It would be better to prevent replay file generation from using RA altogether, but this would be a larger patch and difficult to keep from bitrotting. > > Also provided regression test. > > Tested: > > - manually on Linux x64 and MacOS m1, with and without an artificially inflated resource area usage that reliably triggers the error. With the patch, the error is gone. > - GHAs lgtm ------------- Marked as reviewed by asmehra (Committer). PR Review: https://git.openjdk.org/jdk/pull/19005#pullrequestreview-2035769598 From mdoerr at openjdk.org Thu May 2 13:39:55 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 2 May 2024 13:39:55 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic [v2] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 07:57:18 GMT, Roberto Casta?eda Lozano wrote: >> This changeset generalizes the logic to analyze, declare, and communicate which registers are live at a C2 barrier stub so that it can be used by other collectors than ZGC adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). >> >> The main changes are: >> >> - Make it possible to compute register liveness information before (live-in) or after (live-out) each barrier, and let the collector choose by implementing `BarrierSetC2State::needs_livein_data()`. >> >> - Generalize the interface with which collectors declare which registers must be additionally preserved across barrier runtime calls, adding the methods `BarrierStubC2::preserve(Register r)` and `BarrierStubC2::dont_preserve(Register r)`. >> >> - Simplify the interface with which platform-specific logic computes which registers to preserve across barrier runtime calls, replacing the calls to `BarrierStubC2::result()` and `BarrierStubC2::live()` with a single call to `BarrierStubC2::preserve_set()`. >> >> #### Testing >> >> - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> - tier1-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/4d4e743d8f4cddd5288cee1d69c70fe2b9bea066) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. >> - Build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug). @RealFYang, @TheRealMDoerr: could you please test and review the riscv and ppc changes? Thanks! > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Use VMReg::is_concrete for testing sub-registers Can we change `_barrier_set_state` (https://github.com/openjdk/jdk/blob/a024eed7384828643e302f021a253717f53e3778/src/hotspot/share/opto/compile.hpp#L364) from `void*` to `BarrierSetC2State*` and remove the casts? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19026#issuecomment-2090523487 From stuefe at openjdk.org Thu May 2 13:43:58 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 2 May 2024 13:43:58 GMT Subject: RFR: 8331344: No compiler replay file with CompilerCommand MemLimit In-Reply-To: References: Message-ID: <5ajLmP6-ILt_OQ86cCztSfwQY6OFQoLHgP6vxpPIfLc=.acb454b6-0b9c-4470-b814-4ae8f0b43d3a@github.com> On Thu, 2 May 2024 13:36:11 GMT, Ashutosh Mehra wrote: >> When using the compiler memory limit with the crash suboption (e.g. `-XX:CompileCommand=MemLimit,*.*,1g~crash`), the JVM asserts but may fail to produce a replay file. We also may see partly corrupted hs-err files. >> >> This happens if the memory limit hit was caused by growing ResourceAreas, not the C2 node arena. We also use ResourceArea when producing the replay file. >> >> If those RA usages cause another Arena chunk to be allocated, we re-enter `CompilationMemoryStatistic::on_arena_change` recursively, possibly multiple times. This will at least prevent replay file generation, but also may abort error handling altogether if a stack overflow happens. >> >> The patch prevents that recursion. It would be better to prevent replay file generation from using RA altogether, but this would be a larger patch and difficult to keep from bitrotting. >> >> Also provided regression test. >> >> Tested: >> >> - manually on Linux x64 and MacOS m1, with and without an artificially inflated resource area usage that reliably triggers the error. With the patch, the error is gone. >> - GHAs > > lgtm Thank you, @ashu-mehra and @vnkozlov ------------- PR Comment: https://git.openjdk.org/jdk/pull/19005#issuecomment-2090529484 From stuefe at openjdk.org Thu May 2 13:43:59 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 2 May 2024 13:43:59 GMT Subject: Integrated: 8331344: No compiler replay file with CompilerCommand MemLimit In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 18:39:33 GMT, Thomas Stuefe wrote: > When using the compiler memory limit with the crash suboption (e.g. `-XX:CompileCommand=MemLimit,*.*,1g~crash`), the JVM asserts but may fail to produce a replay file. We also may see partly corrupted hs-err files. > > This happens if the memory limit hit was caused by growing ResourceAreas, not the C2 node arena. We also use ResourceArea when producing the replay file. > > If those RA usages cause another Arena chunk to be allocated, we re-enter `CompilationMemoryStatistic::on_arena_change` recursively, possibly multiple times. This will at least prevent replay file generation, but also may abort error handling altogether if a stack overflow happens. > > The patch prevents that recursion. It would be better to prevent replay file generation from using RA altogether, but this would be a larger patch and difficult to keep from bitrotting. > > Also provided regression test. > > Tested: > > - manually on Linux x64 and MacOS m1, with and without an artificially inflated resource area usage that reliably triggers the error. With the patch, the error is gone. > - GHAs This pull request has now been integrated. Changeset: 389f6fe9 Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/389f6fe97c348e28d8573fe4754138d2a0bd6c0d Stats: 29 lines in 3 files changed: 27 ins; 1 del; 1 mod 8331344: No compiler replay file with CompilerCommand MemLimit Reviewed-by: kvn, asmehra ------------- PR: https://git.openjdk.org/jdk/pull/19005 From stuefe at openjdk.org Thu May 2 13:50:58 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 2 May 2024 13:50:58 GMT Subject: RFR: 8330813: Don't call methods from Compressed(Oops|Klass) if the associated mode is inactive In-Reply-To: References: Message-ID: On Thu, 2 May 2024 13:11:35 GMT, Ashutosh Mehra wrote: >> We should not call methods from CompressedOops if we run with -XX:-UseCompressedOops, and the same goes for CompressedKlass and -XX:-UseCompressedClassPointers. (the latter we do assert in Lilliput). > > lgtm Thanks @ashu-mehra ------------- PR Comment: https://git.openjdk.org/jdk/pull/18883#issuecomment-2090547341 From stuefe at openjdk.org Thu May 2 13:50:59 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 2 May 2024 13:50:59 GMT Subject: Integrated: 8330813: Don't call methods from Compressed(Oops|Klass) if the associated mode is inactive In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 11:04:07 GMT, Thomas Stuefe wrote: > We should not call methods from CompressedOops if we run with -XX:-UseCompressedOops, and the same goes for CompressedKlass and -XX:-UseCompressedClassPointers. (the latter we do assert in Lilliput). This pull request has now been integrated. Changeset: dd0b6418 Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/dd0b6418191c765a92bfd03ec4d4206e0da7ee45 Stats: 14 lines in 1 file changed: 10 ins; 0 del; 4 mod 8330813: Don't call methods from Compressed(Oops|Klass) if the associated mode is inactive Reviewed-by: stefank, asmehra ------------- PR: https://git.openjdk.org/jdk/pull/18883 From stuefe at openjdk.org Thu May 2 13:54:08 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 2 May 2024 13:54:08 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v8] In-Reply-To: References: Message-ID: > See [1] for previous discussions. > > We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. > > The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. > > Examples: > > This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` > > This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` > > > --- > > The patch: > > 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. > 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. > 3) Adapted and extended tests > > I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. > > > Tested: > > - manually on Mac m1 (debug and release) > - GHAs are running > - but Oracle will do more testing before this goes in > > [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Remove unused variable ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18969/files - new: https://git.openjdk.org/jdk/pull/18969/files/691a1467..e2aacaed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18969&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18969&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18969/head:pull/18969 PR: https://git.openjdk.org/jdk/pull/18969 From liach at openjdk.org Thu May 2 14:43:00 2024 From: liach at openjdk.org (Chen Liang) Date: Thu, 2 May 2024 14:43:00 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v4] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 11:08:16 GMT, Adam Sotona wrote: >> Hi, >> During performance optimization work on Class-File API as JDK lambda generator we found some static initialization killers. >> One of them is `java.lang.classfile.Attributes` with tens of static fields initialized with individual attribute mappers, and common set of all mappers, and static map from attribute names to the mappers. >> >> I propose to turn all the static fields into lazy-initialized static methods and remove `PREDEFINED_ATTRIBUTES` and `standardAttribute(Utf8Entry name)` static mapping method from the `Attributes` API class. >> >> Please let me know your comments or objections and please review the [PR](https://github.com/openjdk/jdk/pull/19006) and [CSR](https://bugs.openjdk.org/browse/JDK-8331414), so we can make it into 23. >> >> Thank you, >> Adam > > Adam Sotona has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into JDK-8331291-attributes > - changed order in allowed modules attributes check > - added bug number > - added impl comment > - removed list of predefined attributes > standard attributes mapping hard-coded and moved to BoundAttribute > added AttributesTest::testAttributesMapping > - move mappers implementations to AbstractAttributeMapper > - 8331291: java.lang.classfile.Attributes class performs a lot of static initializations On a side note, will we update JEP 466 to include this patch? ------------- Marked as reviewed by liach (Author). PR Review: https://git.openjdk.org/jdk/pull/19006#pullrequestreview-2035945054 From kvn at openjdk.org Thu May 2 14:44:02 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 2 May 2024 14:44:02 GMT Subject: RFR: 8331253: 16 bits is not enough for nmethod::_skipped_instructions_size field In-Reply-To: References: Message-ID: On Thu, 2 May 2024 12:35:45 GMT, Roberto Casta?eda Lozano wrote: >> In [JDK-8329433](https://bugs.openjdk.org/browse/JDK-8329433) I changed `nmethod::_skipped_instructions_size` field type to `uint16_t` assuming that it only count NOP instructions and GC barriers. I did not take into account that Generational ZGC also incudes barrier stubs into this size (original ZGC missed that). It is correct to include them because these stubs are generated in instructions section and not in stubs section: >> >> >> Statistics for 1330 bytecoded nmethods for C2: >> ... >> ZGC: >> main code = 3237080 (75.567032%) >> stubs code = 810577 (25.040375%) >> skipped insts = 44432 (1.372595%) >> >> GenZGC: >> main code = 4034704 (78.238518%) >> stubs code = 1356703 (33.625839%) >> skipped insts = 1074611 (26.634197%) >> >> >> Note, GenZGC has bigger code because it has store barriers. It generates a separate stub for each barrier, no sharing. >> >> After looking on how `_skipped_instructions_size` is used (only in one place when calculated inlinining size of compiled code) I decided replace it with `int _inline_insts_size;`. It is calculated the same way as before. >> >> And instead of including instructions stubs into `_skipped_instructions_size` I recorded size of instructions in code section before stubs are generated. This allow to get more accurate size of main instructions and no need for `InlineSkippedInstructionsCounter` in GC barriers stubs. >> >> I also fixed code in C2 which estimates size of code and stubs sections. >> >> Tested tier1-4,tier8,stress,xcomp > > Thanks for working on this, Vladimir! I tried out this changeset on a simple example ([example-and-instrumentation.zip](https://github.com/openjdk/jdk/files/15188249/example-and-instrumentation.zip)) using a JVM instrumented with the attached patch to observe the output of `ciMethod::inline_instructions_size()` and this seems to differ before and after the changeset: > > Before: > > > caller: Test foo (LTest$MyObject;)Ljava/lang/Object; inline instructions size: 0 > callee: Test bar (LTest$MyObject;)V inline instructions size: 219 > > > after: > > > caller: Test foo (LTest$MyObject;)Ljava/lang/Object; inline instructions size: 0 > callee: Test bar (LTest$MyObject;)V inline instructions size: 183 > > Is this deviation expected? If so, I suggest to split this changeset into a simple bug fix that only widens the type of `nmethod::_skipped_instructions_size` without affecting the inlining heuristic, and a RFE with the remaining changes. @robcasloz, thank you for looking on PR. Yes, 183 is more accurate number. I don't think I need to split it. Splitting is needed if you need to backport. Which is not my case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19029#issuecomment-2090670068 From kvn at openjdk.org Thu May 2 14:44:02 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 2 May 2024 14:44:02 GMT Subject: RFR: 8331253: 16 bits is not enough for nmethod::_skipped_instructions_size field In-Reply-To: References: Message-ID: On Wed, 1 May 2024 03:31:41 GMT, Vladimir Kozlov wrote: > In [JDK-8329433](https://bugs.openjdk.org/browse/JDK-8329433) I changed `nmethod::_skipped_instructions_size` field type to `uint16_t` assuming that it only count NOP instructions and GC barriers. I did not take into account that Generational ZGC also incudes barrier stubs into this size (original ZGC missed that). It is correct to include them because these stubs are generated in instructions section and not in stubs section: > > > Statistics for 1330 bytecoded nmethods for C2: > ... > ZGC: > main code = 3237080 (75.567032%) > stubs code = 810577 (25.040375%) > skipped insts = 44432 (1.372595%) > > GenZGC: > main code = 4034704 (78.238518%) > stubs code = 1356703 (33.625839%) > skipped insts = 1074611 (26.634197%) > > > Note, GenZGC has bigger code because it has store barriers. It generates a separate stub for each barrier, no sharing. > > After looking on how `_skipped_instructions_size` is used (only in one place when calculated inlinining size of compiled code) I decided replace it with `int _inline_insts_size;`. It is calculated the same way as before. > > And instead of including instructions stubs into `_skipped_instructions_size` I recorded size of instructions in code section before stubs are generated. This allow to get more accurate size of main instructions and no need for `InlineSkippedInstructionsCounter` in GC barriers stubs. > > I also fixed code in C2 which estimates size of code and stubs sections. > > Tested tier1-4,tier8,stress,xcomp Thank you, Tobias, for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19029#issuecomment-2090671546 From kvn at openjdk.org Thu May 2 14:44:03 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 2 May 2024 14:44:03 GMT Subject: Integrated: 8331253: 16 bits is not enough for nmethod::_skipped_instructions_size field In-Reply-To: References: Message-ID: On Wed, 1 May 2024 03:31:41 GMT, Vladimir Kozlov wrote: > In [JDK-8329433](https://bugs.openjdk.org/browse/JDK-8329433) I changed `nmethod::_skipped_instructions_size` field type to `uint16_t` assuming that it only count NOP instructions and GC barriers. I did not take into account that Generational ZGC also incudes barrier stubs into this size (original ZGC missed that). It is correct to include them because these stubs are generated in instructions section and not in stubs section: > > > Statistics for 1330 bytecoded nmethods for C2: > ... > ZGC: > main code = 3237080 (75.567032%) > stubs code = 810577 (25.040375%) > skipped insts = 44432 (1.372595%) > > GenZGC: > main code = 4034704 (78.238518%) > stubs code = 1356703 (33.625839%) > skipped insts = 1074611 (26.634197%) > > > Note, GenZGC has bigger code because it has store barriers. It generates a separate stub for each barrier, no sharing. > > After looking on how `_skipped_instructions_size` is used (only in one place when calculated inlinining size of compiled code) I decided replace it with `int _inline_insts_size;`. It is calculated the same way as before. > > And instead of including instructions stubs into `_skipped_instructions_size` I recorded size of instructions in code section before stubs are generated. This allow to get more accurate size of main instructions and no need for `InlineSkippedInstructionsCounter` in GC barriers stubs. > > I also fixed code in C2 which estimates size of code and stubs sections. > > Tested tier1-4,tier8,stress,xcomp This pull request has now been integrated. Changeset: 3383ad63 Author: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/3383ad6397d5a2d8fb232ffd3e29a54e0b37b686 Stats: 46 lines in 9 files changed: 27 ins; 7 del; 12 mod 8331253: 16 bits is not enough for nmethod::_skipped_instructions_size field Reviewed-by: dlong, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/19029 From roland at openjdk.org Thu May 2 14:54:17 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 2 May 2024 14:54:17 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v18] In-Reply-To: References: Message-ID: > This change implements C2 optimizations for calls to > ScopedValue.get(). Indeed, in: > > > v1 = scopedValue.get(); > ... > v2 = scopedValue.get(); > > > `v2` can be replaced by `v1` and the second call to `get()` can be > optimized out. That's true whatever is between the 2 calls unless a > new mapping for `scopedValue` is created in between (when that happens > no optimizations is performed for the method being compiled). Hoisting > a `get()` call out of loop for a loop invariant `scopedValue` should > also be legal in most cases. > > `ScopedValue.get()` is implemented in java code as a 2 step process. A > cache is attached to the current thread object. If the `ScopedValue` > object is in the cache then the result from `get()` is read from > there. Otherwise a slow call is performed that also inserts the > mapping in the cache. The cache itself is lazily allocated. One > `ScopedValue` can be hashed to 2 different indexes in the cache. On a > cache probe, both indexes are checked. As a consequence, the process > of probing the cache is a multi step process (check if the cache is > present, check first index, check second index if first index > failed). If the cache is populated early on, then when the method that > calls `ScopedValue.get()` is compiled, profile reports the slow path > as never taken and only the read from the cache is compiled. > > To perform the optimizations, I added 3 new node types to C2: > > - the pair > ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for > the cache probe > > - a cfg node ScopedValueGetResultNode to help locate the result of the > `get()` call in the IR graph. > > In pseudo code, once the nodes are inserted, the code of a `get()` is: > > > hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) > if (hits_in_the_cache) { > res = ScopedValueGetLoadFromCache(hits_in_the_cache); > } else { > res = ..; //slow call possibly inlined. Subgraph can be arbitray complex > } > res = ScopedValueGetResult(res) > > > In the snippet: > > > v1 = scopedValue.get(); > ... > v2 = scopedValue.get(); > > > Replacing `v2` by `v1` is then done by starting from the > `ScopedValueGetResult` node for the second `get()` and looking for a > dominating `ScopedValueGetResult` for the same `ScopedValue` > object. When one is found, it is used as a replacement. Eliminating > the second `get()` call is achieved by making > `ScopedValueGetHitsInCache` always successful if there's a dominating > `ScopedValueGetResult` and replacing its companion > `ScopedValueGetLoadFromCache` by the dominating > `ScopedValueGetResult`. > > Hoisting a `g... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: whitespaces ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16966/files - new: https://git.openjdk.org/jdk/pull/16966/files/d38872fd..7723c9c7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16966&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16966&range=16-17 Stats: 25 lines in 1 file changed: 0 ins; 1 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/16966.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16966/head:pull/16966 PR: https://git.openjdk.org/jdk/pull/16966 From roland at openjdk.org Thu May 2 15:15:59 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 2 May 2024 15:15:59 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v13] In-Reply-To: <_x-OSownzQQZ8fmlsbvQ42MLf9BGZskECTNncOE0s4E=.8381a076-0cc4-4339-924f-fa22ca780573@github.com> References: <9Eoh8hOSSVvAtf9iVQ6hflQyceUtt4dpZdqm61zg5XI=.358a4d79-70d9-4b54-85d5-37c6817f0fae@github.com> <_x-OSownzQQZ8fmlsbvQ42MLf9BGZskECTNncOE0s4E=.8381a076-0cc4-4339-924f-fa22ca780573@github.com> Message-ID: On Thu, 2 May 2024 08:24:34 GMT, Dean Long wrote: >> Then, I think we should add an assert that `!type->as_instance_klass()->is_interface()` and also that it's not and array of interfaces (using `base_element_klass()`) > > An array of interfaces can be exact: > > new Interface[20].getClasss(); > > and it seems like it would be safe to allow this, so I think we only need one assert for `!type->as_instance_klass()->is_interface()` if we don't trust the result of exact_type(). Right. Then I think it would be safer to add an assert for `!type->as_instance_klass()->is_interface()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1587817018 From rcastanedalo at openjdk.org Thu May 2 15:28:54 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 2 May 2024 15:28:54 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic [v2] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 13:37:14 GMT, Martin Doerr wrote: > Can we change `_barrier_set_state` ( > > https://github.com/openjdk/jdk/blob/a024eed7384828643e302f021a253717f53e3778/src/hotspot/share/opto/compile.hpp#L364 > > ) from `void*` to `BarrierSetC2State*` and remove the casts? Thanks for the suggestion, this would be a nice improvement, however it would be fairly pervasive (I sketched it it in https://github.com/openjdk/jdk/commit/cf5c1587e0ea90a8b3de4c70e0a2bf6ba4158f15), so I think it would be better to apply it as a separate RFE, perhaps after [non-generational ZGC is removed](https://openjdk.org/jeps/474) for simplicity. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19026#issuecomment-2090805965 From mdoerr at openjdk.org Thu May 2 15:49:53 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 2 May 2024 15:49:53 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic [v2] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 07:57:18 GMT, Roberto Casta?eda Lozano wrote: >> This changeset generalizes the logic to analyze, declare, and communicate which registers are live at a C2 barrier stub so that it can be used by other collectors than ZGC adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). >> >> The main changes are: >> >> - Make it possible to compute register liveness information before (live-in) or after (live-out) each barrier, and let the collector choose by implementing `BarrierSetC2State::needs_livein_data()`. >> >> - Generalize the interface with which collectors declare which registers must be additionally preserved across barrier runtime calls, adding the methods `BarrierStubC2::preserve(Register r)` and `BarrierStubC2::dont_preserve(Register r)`. >> >> - Simplify the interface with which platform-specific logic computes which registers to preserve across barrier runtime calls, replacing the calls to `BarrierStubC2::result()` and `BarrierStubC2::live()` with a single call to `BarrierStubC2::preserve_set()`. >> >> #### Testing >> >> - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> - tier1-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/4d4e743d8f4cddd5288cee1d69c70fe2b9bea066) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. >> - Build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug). @RealFYang, @TheRealMDoerr: could you please test and review the riscv and ppc changes? Thanks! > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Use VMReg::is_concrete for testing sub-registers I haven't thought about future usages of `BarrierSetC2State::needs_livein_data()`. I guess it's intended for G1. Otherwise, LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19026#pullrequestreview-2036126334 From sviswanathan at openjdk.org Thu May 2 16:18:54 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 2 May 2024 16:18:54 GMT Subject: RFR: 8326421: Add jtreg test for large arrayCopy disjoint case. [v2] In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 07:11:44 GMT, Swati Sharma wrote: >> Hi All, >> >> Added a new jtreg test case for large arrayCopy disjoint case. >> This will test byte array copy operation for aligned and non aligned cases with array length greater than 2.5MB. >> >> Please review and provide your feedback. >> >> Thanks, >> Swati >> Intel > > Swati Sharma has updated the pull request incrementally with one additional commit since the last revision: > > 8326421: Resolved review comments. Looks good to me as well. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17962#pullrequestreview-2036222099 From sviswanathan at openjdk.org Thu May 2 17:03:56 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 2 May 2024 17:03:56 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v12] In-Reply-To: <1g7DGTS-7SUhuXFL8NniTGAQSgskv-CdrwtOGHymZqk=.f2ea7538-1ef4-4f94-af4d-972d64e7f699@github.com> References: <1g7DGTS-7SUhuXFL8NniTGAQSgskv-CdrwtOGHymZqk=.f2ea7538-1ef4-4f94-af4d-972d64e7f699@github.com> Message-ID: On Thu, 2 May 2024 00:05:20 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > simplification and fix asserts in ldmxcsr, stmxcsr, and emit_prefix_and_int8 Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18476#pullrequestreview-2036328881 From sviswanathan at openjdk.org Thu May 2 17:03:57 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 2 May 2024 17:03:57 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v9] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 00:02:28 GMT, Steve Dohrmann wrote: >>> It looks to me that the source and dest are reversed in the following instruction in call to simd_prefix_and_encode, perhaps that should be a separate PR: // Do we have this wrong src and dst reversed in simd_prefix_and_encode? void Assembler::pextrw(Register dst, XMMRegister src, int imm8) { assert(VM_Version::supports_sse2(), ""); InstructionAttr attributes(AVX_128bit, /* rex_w _/ false, /_ legacy_mode _/ _legacy_mode_bw, /_ no_mask_reg _/ true, /_ uses_vl */ false); int encode = simd_prefix_and_encode(as_XMMRegister(dst->encoding()), xnoreg, src, VEX_SIMD_66, VEX_OPCODE_0F, &attributes); emit_int24((unsigned char)0xC5, (0xC0 | encode), imm8); } Once that PR is fixed, is_src_gpr should be set to true for this one as well. >> >> Verified that the pextrw has the operands reversed per the SDM, so please ignore this comment. > > @sviswa7 Thank you for your review comments. Very helpful! @steveatgh Please also do a merge with master. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2091070895 From duke at openjdk.org Thu May 2 17:11:56 2024 From: duke at openjdk.org (Swati Sharma) Date: Thu, 2 May 2024 17:11:56 GMT Subject: RFR: 8326421: Add jtreg test for large arrayCopy disjoint case. [v2] In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 07:11:44 GMT, Swati Sharma wrote: >> Hi All, >> >> Added a new jtreg test case for large arrayCopy disjoint case. >> This will test byte array copy operation for aligned and non aligned cases with array length greater than 2.5MB. >> >> Please review and provide your feedback. >> >> Thanks, >> Swati >> Intel > > Swati Sharma has updated the pull request incrementally with one additional commit since the last revision: > > 8326421: Resolved review comments. add /contributor @steveatgh ------------- PR Comment: https://git.openjdk.org/jdk/pull/17962#issuecomment-2091089198 From never at openjdk.org Thu May 2 17:49:54 2024 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 2 May 2024 17:49:54 GMT Subject: RFR: 8329982: compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/SimpleDebugInfoTest.java failed assert(oopDesc::is_oop_or_null(val)) failed: bad oop found [v2] In-Reply-To: References: Message-ID: <76ydFdG47VvNGmaDZ-FhC_t5LGaCD-8Fjre-6l5f2YE=.289127d7-543b-4ddd-9b77-32f909610264@github.com> On Wed, 1 May 2024 20:57:14 GMT, Doug Simon wrote: >> This PR adds the missing nmethod entry barriers to JVMCI hand assembled tests. >> It also closes the escape hatch in jvmciCodeInstaller.cpp that allowed JVMCI code to be installed without nmethod entry barriers. > > Doug Simon has updated the pull request incrementally with two additional commits since the last revision: > > - remove vestiges of optional JVMCI nmethod support for entry barriers > - fixed failing tests and removed tests that install no longer valid code It would be super nice if we could figure out a clean way to share canned snippets of assembly from HotSpot back through JVMCI. There are lots of potential complexities though: register usage, the jcc erratum, relocations, fast/slow splits. The emit function could be called from the Graal assembler so that the sizing and alignment can be properly handled. HotSpot relocations could be translated in some fashion and maybe labels could be handled as well. The nmethod entry barrier fast path emission could probably be handled fairly cleanly since it's mostly a straightline snippet with a conditional branch at the end. It's just unclear if building that machinery is more complicated than maintaining and checking a clone of a small piece of assembly. The TestAssembler is a dubious piece of code given the complexity of emitting real nmethods. It doesn't even support the complex return sequence being used these days. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19035#issuecomment-2091158907 From duke at openjdk.org Thu May 2 18:32:02 2024 From: duke at openjdk.org (Swati Sharma) Date: Thu, 2 May 2024 18:32:02 GMT Subject: Integrated: 8326421: Add jtreg test for large arrayCopy disjoint case. In-Reply-To: References: Message-ID: On Thu, 22 Feb 2024 13:01:50 GMT, Swati Sharma wrote: > Hi All, > > Added a new jtreg test case for large arrayCopy disjoint case. > This will test byte array copy operation for aligned and non aligned cases with array length greater than 2.5MB. > > Please review and provide your feedback. > > Thanks, > Swati > Intel This pull request has now been integrated. Changeset: 73cdc9a0 Author: Swati Sharma Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/73cdc9a070249791f7d228a93fe5b9335c5f72bd Stats: 87 lines in 1 file changed: 87 ins; 0 del; 0 mod 8326421: Add jtreg test for large arrayCopy disjoint case. Co-authored-by: Steve Dohrmann Reviewed-by: kvn, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/17962 From roland at openjdk.org Thu May 2 18:47:00 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 2 May 2024 18:47:00 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v9] In-Reply-To: References: <5H6XV7Agl6ZNfGWT-bCbIPsimFTYM0pyIGiAHDQUUyA=.168e21cc-6cd8-42d8-ab59-d5e02e241ea2@github.com> <0RKnLUgc6UBtyxSyezCMWsSbP50hu6fQ6UJPHpGlgSU=.9fafa10f-62ee-4ec8-9093-4e204fcbe504@github.com> <5QbsVmYi0tYGlOvDL4LjJb1SjChIZtaWSMthFM9grMI=.0900e1c3-90b3-4726-a7c6-c2aff49d07ce@github.com> Message-ID: On Mon, 29 Apr 2024 07:12:55 GMT, Emanuel Peter wrote: >> @eme64 can you go over my replies above and let me know if they sound good to you? Thanks. > > I'm waiting for @rwestrel to respond to my last list of comments/questions. @eme64 change is ready for another review ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2091260926 From asmehra at openjdk.org Thu May 2 19:52:53 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Thu, 2 May 2024 19:52:53 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v8] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 13:54:08 GMT, Thomas Stuefe wrote: >> See [1] for previous discussions. >> >> We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. >> >> The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. >> >> Examples: >> >> This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` >> >> This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` >> >> >> --- >> >> The patch: >> >> 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. >> 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. >> 3) Adapted and extended tests >> >> I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. >> >> >> Tested: >> >> - manually on Mac m1 (debug and release) >> - GHAs are running >> - but Oracle will do more testing before this goes in >> >> [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Remove unused variable Marked as reviewed by asmehra (Committer). Just one suggestion which you may pick or ignore. Otherwise looks good. test/hotspot/jtreg/compiler/print/CompileCommandMemLimit.java line 143: > 141: // total NA RA result #nodes limit time type #rc thread method > 142: // 32728 0 32728 ok - 1024M 0.045 c1 1 0x000000011b019c10 compiler/print/CompileCommandMemLimit$TestMain::method1(()J) > 143: oa.shouldMatch("\\d+ +\\d+ +\\d+ +ok +" + numberNodesRegex + " +" + implicitMemoryLimit + " +.* +" + method1regex); A minor suggestion regarding the regex. I find "\s+" more readable than " +" to match multiple spaces. ------------- PR Review: https://git.openjdk.org/jdk/pull/18969#pullrequestreview-2036760008 PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2091437900 PR Review Comment: https://git.openjdk.org/jdk/pull/18969#discussion_r1588272234 From duke at openjdk.org Thu May 2 20:31:17 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Thu, 2 May 2024 20:31:17 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v13] In-Reply-To: References: Message-ID: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: - update for egpr use: bzhil(R,R,R), btq(R,R), btq(R,imm) - Merge branch 'master' into apx-encoding-pr - Update full name - simplification and fix asserts in ldmxcsr, stmxcsr, and emit_prefix_and_int8 - remove is_map1 comment for addb, andb, movb, orb, testb, xchgb, xorb - fix stmxcrs REX2 branch, add asserts to SHA instructions - fixes: pp bits in crc32, REX2 branch in ldmxcsr - add egpr support for popcntq(R,A), cvttsd2siq(R,A), popq(R) - fix 4 more src_is_gpr = true cases, add asserts to check for UseAPX - fix is_gpr arg on two functions with reversed src / dst operands - ... and 10 more: https://git.openjdk.org/jdk/compare/27262415...7b3e8ec7 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/46eb6b42..7b3e8ec7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=11-12 Stats: 117386 lines in 3057 files changed: 52969 ins; 48551 del; 15866 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From dnsimon at openjdk.org Thu May 2 21:35:08 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 2 May 2024 21:35:08 GMT Subject: RFR: 8329982: compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/SimpleDebugInfoTest.java failed assert(oopDesc::is_oop_or_null(val)) failed: bad oop found [v3] In-Reply-To: References: Message-ID: > This PR adds the missing nmethod entry barriers to JVMCI hand assembled tests. > It also closes the escape hatch in jvmciCodeInstaller.cpp that allowed JVMCI code to be installed without nmethod entry barriers. Doug Simon has updated the pull request incrementally with two additional commits since the last revision: - fix NativeCallTest on x64 - remove more vestiges of optional JVMCI nmethod support for entry barriers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19035/files - new: https://git.openjdk.org/jdk/pull/19035/files/be4bf630..1b30b67e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19035&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19035&range=01-02 Stats: 8 lines in 2 files changed: 0 ins; 2 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19035.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19035/head:pull/19035 PR: https://git.openjdk.org/jdk/pull/19035 From sviswanathan at openjdk.org Thu May 2 21:48:00 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 2 May 2024 21:48:00 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v13] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 20:31:17 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: > > - update for egpr use: bzhil(R,R,R), btq(R,R), btq(R,imm) > - Merge branch 'master' into apx-encoding-pr > - Update full name > - simplification and fix asserts in ldmxcsr, stmxcsr, and emit_prefix_and_int8 > - remove is_map1 comment for addb, andb, movb, orb, testb, xchgb, xorb > - fix stmxcrs REX2 branch, add asserts to SHA instructions > - fixes: pp bits in crc32, REX2 branch in ldmxcsr > - add egpr support for popcntq(R,A), cvttsd2siq(R,A), popq(R) > - fix 4 more src_is_gpr = true cases, add asserts to check for UseAPX > - fix is_gpr arg on two functions with reversed src / dst operands > - ... and 10 more: https://git.openjdk.org/jdk/compare/335b7c9e...7b3e8ec7 The recent changes post merge with master look good. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18476#pullrequestreview-2037002903 From duke at openjdk.org Thu May 2 23:33:59 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Thu, 2 May 2024 23:33:59 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v13] In-Reply-To: References: Message-ID: On Mon, 15 Apr 2024 17:43:45 GMT, Vladimir Kozlov wrote: >> Steve Dohrmann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: >> >> - update for egpr use: bzhil(R,R,R), btq(R,R), btq(R,imm) >> - Merge branch 'master' into apx-encoding-pr >> - Update full name >> - simplification and fix asserts in ldmxcsr, stmxcsr, and emit_prefix_and_int8 >> - remove is_map1 comment for addb, andb, movb, orb, testb, xchgb, xorb >> - fix stmxcrs REX2 branch, add asserts to SHA instructions >> - fixes: pp bits in crc32, REX2 branch in ldmxcsr >> - add egpr support for popcntq(R,A), cvttsd2siq(R,A), popq(R) >> - fix 4 more src_is_gpr = true cases, add asserts to check for UseAPX >> - fix is_gpr arg on two functions with reversed src / dst operands >> - ... and 10 more: https://git.openjdk.org/jdk/compare/9abb31e9...7b3e8ec7 > > I have few comments. @vnkozlov Are there other things you would like to see for this pull request? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2091905594 From stuefe at openjdk.org Fri May 3 05:33:17 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 3 May 2024 05:33:17 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v9] In-Reply-To: References: Message-ID: > See [1] for previous discussions. > > We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. > > The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. > > Examples: > > This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` > > This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` > > > --- > > The patch: > > 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. > 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. > 3) Adapted and extended tests > > I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. > > > Tested: > > - manually on Mac m1 (debug and release) > - GHAs are running > - but Oracle will do more testing before this goes in > > [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: - merge master and fix conflicts - Remove unused variable - Remove accidental change to TestDeadPhiMergeMemLoop.java - fix copyrights - fix copyrights - another fix - fix accidental slip in of another test name - fix jdk note number in test comment - Disable memory limit for compiler/loopopts/TestDeepGraphVerifyIterativeGVN.java until JDK-8331295 is fixed - Merge branch 'master' into compiler-default-limit - ... and 6 more: https://git.openjdk.org/jdk/compare/6bef0474...f6396010 ------------- Changes: https://git.openjdk.org/jdk/pull/18969/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18969&range=08 Stats: 165 lines in 7 files changed: 114 ins; 12 del; 39 mod Patch: https://git.openjdk.org/jdk/pull/18969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18969/head:pull/18969 PR: https://git.openjdk.org/jdk/pull/18969 From stuefe at openjdk.org Fri May 3 05:33:17 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 3 May 2024 05:33:17 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v8] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 19:50:17 GMT, Ashutosh Mehra wrote: > Just one suggestion which you may pick or ignore. Otherwise looks good. Many thanks, @ashu-mehra ! I will actually ignore your suggestion, because I want the expression to only match spaces precisely, not whitespaces. But for any-whitespace, I usually do as you suggest. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2092322060 From chagedorn at openjdk.org Fri May 3 05:53:01 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 3 May 2024 05:53:01 GMT Subject: RFR: 8331404: IGV: Show line numbers for callees in properties In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 15:38:20 GMT, Christian Hagedorn wrote: > IGV shows the `bci` for a node in the callee, followed by the bci in the caller method and so on until we reach the root method. For the `line` property, we currently only show the line number found in the root method (`first()` is the root method being compiled and `second()` and `third()` are inlined): > > Example program: > ![image](https://github.com/openjdk/jdk/assets/17833009/579fe9eb-4bd8-42d8-9d03-875f25bd97ae) > > Properties of the store to `fFld`: > ![image](https://github.com/openjdk/jdk/assets/17833009/3763cccf-c1ba-4d7f-a986-eae8bf0654b0) > > One could read the line number from the `jvms` property above. But you would need to expand that property with the button on the right side which opens a window. But then you cannot click anything else anymore in IGV until you close the window again. > > A simpler and easier to read solution is to add the line number information to match the bci numbers (they are printed in callee->root method order which I think is okay - especially if there are a lot of inlinees, it could be easier to have the really interesting numbers at the start on the left side). This would look something like that: > ![image](https://github.com/openjdk/jdk/assets/17833009/fcab3af6-69ac-43ae-89be-19fc4476d12f) > > If there is no line number information for a bci, I simply emit a `_`. > > Testing: > - Manual testing in IGV > - Sanity testing by running `java -Xcomp -XX:+PrintIdealGraph -XX:PrintIdealGraphLevel=4 -XX:PrintIdealGraphFile=graph.xml HelloWorld.java`. > > Thanks, > Christian Thanks Tobias for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19025#issuecomment-2092348356 From chagedorn at openjdk.org Fri May 3 05:53:01 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 3 May 2024 05:53:01 GMT Subject: Integrated: 8331404: IGV: Show line numbers for callees in properties In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 15:38:20 GMT, Christian Hagedorn wrote: > IGV shows the `bci` for a node in the callee, followed by the bci in the caller method and so on until we reach the root method. For the `line` property, we currently only show the line number found in the root method (`first()` is the root method being compiled and `second()` and `third()` are inlined): > > Example program: > ![image](https://github.com/openjdk/jdk/assets/17833009/579fe9eb-4bd8-42d8-9d03-875f25bd97ae) > > Properties of the store to `fFld`: > ![image](https://github.com/openjdk/jdk/assets/17833009/3763cccf-c1ba-4d7f-a986-eae8bf0654b0) > > One could read the line number from the `jvms` property above. But you would need to expand that property with the button on the right side which opens a window. But then you cannot click anything else anymore in IGV until you close the window again. > > A simpler and easier to read solution is to add the line number information to match the bci numbers (they are printed in callee->root method order which I think is okay - especially if there are a lot of inlinees, it could be easier to have the really interesting numbers at the start on the left side). This would look something like that: > ![image](https://github.com/openjdk/jdk/assets/17833009/fcab3af6-69ac-43ae-89be-19fc4476d12f) > > If there is no line number information for a bci, I simply emit a `_`. > > Testing: > - Manual testing in IGV > - Sanity testing by running `java -Xcomp -XX:+PrintIdealGraph -XX:PrintIdealGraphLevel=4 -XX:PrintIdealGraphFile=graph.xml HelloWorld.java`. > > Thanks, > Christian This pull request has now been integrated. Changeset: 8bc641eb Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/8bc641ebe75ba4c975a99a8646b89ed10a7029f5 Stats: 51 lines in 2 files changed: 31 ins; 16 del; 4 mod 8331404: IGV: Show line numbers for callees in properties Reviewed-by: tholenstein, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/19025 From aboldtch at openjdk.org Fri May 3 06:42:57 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 3 May 2024 06:42:57 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic [v2] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 07:57:18 GMT, Roberto Casta?eda Lozano wrote: >> This changeset generalizes the logic to analyze, declare, and communicate which registers are live at a C2 barrier stub so that it can be used by other collectors than ZGC adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). >> >> The main changes are: >> >> - Make it possible to compute register liveness information before (live-in) or after (live-out) each barrier, and let the collector choose by implementing `BarrierSetC2State::needs_livein_data()`. >> >> - Generalize the interface with which collectors declare which registers must be additionally preserved across barrier runtime calls, adding the methods `BarrierStubC2::preserve(Register r)` and `BarrierStubC2::dont_preserve(Register r)`. >> >> - Simplify the interface with which platform-specific logic computes which registers to preserve across barrier runtime calls, replacing the calls to `BarrierStubC2::result()` and `BarrierStubC2::live()` with a single call to `BarrierStubC2::preserve_set()`. >> >> #### Testing >> >> - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> - tier1-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/4d4e743d8f4cddd5288cee1d69c70fe2b9bea066) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. >> - Build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug). @RealFYang, @TheRealMDoerr: could you please test and review the riscv and ppc changes? Thanks! > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Use VMReg::is_concrete for testing sub-registers lgtm. A few nits. src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 98: > 96: _entry(), > 97: _continuation(), > 98: _preserve(live()){} Suggestion: _preserve(live()) {} src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 879: > 877: if (!bs_state->needs_livein_data()) { > 878: RegMask* const regs = bs_state->live(node); > 879: if (regs != NULL) { Suggestion: if (regs != nullptr) { src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 910: > 908: if (bs_state->needs_livein_data()) { > 909: RegMask* const regs = bs_state->live(node); > 910: if (regs != NULL) { Suggestion: if (regs != nullptr) { ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19026#pullrequestreview-2037480805 PR Review Comment: https://git.openjdk.org/jdk/pull/19026#discussion_r1588795914 PR Review Comment: https://git.openjdk.org/jdk/pull/19026#discussion_r1588796061 PR Review Comment: https://git.openjdk.org/jdk/pull/19026#discussion_r1588796181 From rcastanedalo at openjdk.org Fri May 3 06:42:57 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 3 May 2024 06:42:57 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic [v2] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 07:57:18 GMT, Roberto Casta?eda Lozano wrote: >> This changeset generalizes the logic to analyze, declare, and communicate which registers are live at a C2 barrier stub so that it can be used by other collectors than ZGC adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). >> >> The main changes are: >> >> - Make it possible to compute register liveness information before (live-in) or after (live-out) each barrier, and let the collector choose by implementing `BarrierSetC2State::needs_livein_data()`. >> >> - Generalize the interface with which collectors declare which registers must be additionally preserved across barrier runtime calls, adding the methods `BarrierStubC2::preserve(Register r)` and `BarrierStubC2::dont_preserve(Register r)`. >> >> - Simplify the interface with which platform-specific logic computes which registers to preserve across barrier runtime calls, replacing the calls to `BarrierStubC2::result()` and `BarrierStubC2::live()` with a single call to `BarrierStubC2::preserve_set()`. >> >> #### Testing >> >> - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> - tier1-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/4d4e743d8f4cddd5288cee1d69c70fe2b9bea066) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. >> - Build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug). @RealFYang, @TheRealMDoerr: could you please test and review the riscv and ppc changes? Thanks! > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Use VMReg::is_concrete for testing sub-registers Thanks for reviewing, Martin! I reported your suggested refactoring here: https://bugs.openjdk.org/browse/JDK-8331623. > I haven't thought about future usages of `BarrierSetC2State::needs_livein_data()`. I guess it's intended for G1. That's correct, it is primarily intended for G1. But ZGC could also benefit, in the future, from using live-out instead of live-in data in the spilling logic. The current solution is slightly over-conservative in that it might spill some registers unnecessarily. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19026#issuecomment-2092395247 From rcastanedalo at openjdk.org Fri May 3 06:47:10 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 3 May 2024 06:47:10 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic [v3] In-Reply-To: References: Message-ID: <74Np8LGxo8PiyoLAUI7tUlAq7ySVgmGzblZio5Tlhx8=.c0fdf457-bc2f-4f88-a070-325521b469f9@github.com> > This changeset generalizes the logic to analyze, declare, and communicate which registers are live at a C2 barrier stub so that it can be used by other collectors than ZGC adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). > > The main changes are: > > - Make it possible to compute register liveness information before (live-in) or after (live-out) each barrier, and let the collector choose by implementing `BarrierSetC2State::needs_livein_data()`. > > - Generalize the interface with which collectors declare which registers must be additionally preserved across barrier runtime calls, adding the methods `BarrierStubC2::preserve(Register r)` and `BarrierStubC2::dont_preserve(Register r)`. > > - Simplify the interface with which platform-specific logic computes which registers to preserve across barrier runtime calls, replacing the calls to `BarrierStubC2::result()` and `BarrierStubC2::live()` with a single call to `BarrierStubC2::preserve_set()`. > > #### Testing > > - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). > - tier1-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/4d4e743d8f4cddd5288cee1d69c70fe2b9bea066) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. > - Build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug). @RealFYang, @TheRealMDoerr: could you please test and review the riscv and ppc changes? Thanks! Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Apply code style suggestions from Axel Co-authored-by: Axel Boldt-Christmas ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19026/files - new: https://git.openjdk.org/jdk/pull/19026/files/c0fc66de..254c8849 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19026&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19026&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19026.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19026/head:pull/19026 PR: https://git.openjdk.org/jdk/pull/19026 From rcastanedalo at openjdk.org Fri May 3 06:47:10 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 3 May 2024 06:47:10 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic [v2] In-Reply-To: References: Message-ID: On Fri, 3 May 2024 06:40:00 GMT, Axel Boldt-Christmas wrote: > lgtm. > > A few nits. Thanks for reviewing and for the style suggestions, Axel! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19026#issuecomment-2092401560 From dnsimon at openjdk.org Fri May 3 08:27:55 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 3 May 2024 08:27:55 GMT Subject: RFR: 8329982: compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/SimpleDebugInfoTest.java failed assert(oopDesc::is_oop_or_null(val)) failed: bad oop found [v2] In-Reply-To: <76ydFdG47VvNGmaDZ-FhC_t5LGaCD-8Fjre-6l5f2YE=.289127d7-543b-4ddd-9b77-32f909610264@github.com> References: <76ydFdG47VvNGmaDZ-FhC_t5LGaCD-8Fjre-6l5f2YE=.289127d7-543b-4ddd-9b77-32f909610264@github.com> Message-ID: On Thu, 2 May 2024 17:46:54 GMT, Tom Rodriguez wrote: > The TestAssembler is a dubious piece of code given the complexity of emitting real nmethods. It doesn't even support the complex return sequence being used these days. The next time there's a TestAssembler failure due to a change in nmethod invariants, I will remove it completely. There is sufficient coverage now in Graal tests that it no longer offers sufficient value. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19035#issuecomment-2092543435 From aph at openjdk.org Fri May 3 08:52:57 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 3 May 2024 08:52:57 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers In-Reply-To: References: Message-ID: On Tue, 9 Apr 2024 19:44:26 GMT, Dean Long wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > How can we be confident that the encoding is correct? Would it be possible to write tests for this? Maybe one that disassembles it and compares the result to a 3rd party disassembler offline or in-process hsdis? > Thank you @dean-long for the comment. I agree, tests are needed. Up to this point we have not had a separate formal tool to test encoding of x86. I did a lot of manual testing by adding loops that used r0-r31in different addressing patterns. I put these in a stub file that would be compiled by hotspot but not executed. I manually compared the disassembly of that against the output of similar assembly included in a small C program and run on the SDE. This worked pretty well for debugging but the manual aspect of it makes it error-prone and it takes a lot of time, too much time if iterating an implementation. > > Subsequent pull requests will add encoding support for additional APX instructions (e.g. those using New Data Destination). Maybe one of these PRs can include a tool for testing instruction encoding for APX features. What do you think? When we wrote the AArch64 port, there was no available hardware to test it on. So, we wrote a simulator to test it. However, we ran the risk that if our understanding of instruction encoding was wrong, our assembler and our simulator might appear to work correctly when used together, but the result would not run on real AArch64 hardware once it arrived. So, as well as a simulator for the architecture, we verified the internal HotSpot assembler by checking its encoding against GNU `as`. See /test/hotspot/gtest/aarch64, where a Python program generates source for both the HotSpot internal assembler and GNU `as`. I strongly suggest you do something similar. (As a matter for the historical record, this did work. The test found several encoding bugs. Once we got the first real AArch64 hardware, the port worked almost immediately.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2092579988 From bkilambi at openjdk.org Fri May 3 09:14:05 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 3 May 2024 09:14:05 GMT Subject: RFR: 8331400: AArch64: Sync aarch64_vector.ad with aarch64_vector_ad.m4 Message-ID: This commit - [1] modified the aarch64_vector.ad directly. This patch includes that change in the aarch64_vector_ad.m4 file as well and generates the aarch64_vector.ad file from it. [1] https://github.com/openjdk/jdk/commit/185e711bfe4c4d013b56e867f85cfb4177b3a2cf ------------- Commit messages: - 8331400: AArch64: Sync aarch64_vector.ad with aarch64_vector_ad.m4 Changes: https://git.openjdk.org/jdk/pull/19077/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19077&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331400 Stats: 10 lines in 2 files changed: 6 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19077.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19077/head:pull/19077 PR: https://git.openjdk.org/jdk/pull/19077 From aph at openjdk.org Fri May 3 09:38:52 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 3 May 2024 09:38:52 GMT Subject: RFR: 8331400: AArch64: Sync aarch64_vector.ad with aarch64_vector_ad.m4 In-Reply-To: References: Message-ID: <-eH1_cLhL2ADd9kuizMnuMev2nq4lVxdSl7wjVWr030=.b83eb321-4b41-4e8f-818b-b3c57a99ecb4@github.com> On Fri, 3 May 2024 09:07:25 GMT, Bhavana Kilambi wrote: > This commit - [1] modified the aarch64_vector.ad directly. This patch includes that change in the aarch64_vector_ad.m4 file as well and generates the aarch64_vector.ad file from it. > > [1] https://github.com/openjdk/jdk/commit/185e711bfe4c4d013b56e867f85cfb4177b3a2cf OK. Obvious/trivial. Daaamn, that shouldn't have happened. It did happen, though, because the patch wasn't AArch64-specific so none of the AArch64 noticed it. I'm a bit reluctant to splatter aarch64_vector.ad with ` // DO NOT EDIT ANYTHING IN THIS SECTION OF THE FILE` ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19077#pullrequestreview-2037763354 From roland at openjdk.org Fri May 3 10:00:17 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 3 May 2024 10:00:17 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v3] In-Reply-To: References: Message-ID: > Range check `CastII` nodes are removed once loop opts are over. The > test case for this change includes 3 cases where elimination of a > range check `CastII` causes a crash in compiled code because either a > out of bounds array load or a division by zero happen. > > In `test1`: > > - the range checks for the `array[otherArray.length]` loads constant > fold: `otherArray.length` is a `CastII` of i at the `otherArray` > allocation. `i` is less than 9. The `CastII` at the allocation > narrows the type down further to `[0-9]`. > > - the `array[otherArray.length]` loads are control dependent on the > unrelated: > > > if (flag == 0) { > > > test. There's an identical dominating test which replaces that one. As > a consequence, the `array[otherArray.length]` loads become control > dependent on the dominating test. > > - The `CastII` nodes at the `otherArray` allocations are replaced by a > dominating range check `CastII` nodes for: > > > newArray[i] = 42; > > > - After loop opts, the range check `CastII` nodes are removed and the > 2 `array[otherArray.length]` loads common at the first: > > > if (flag == 0) { > > > test before the: > > > float[] otherArray = new float[i]; > > > and > > > newArray[i] = 42; > > > that guarantee `i` is positive. > > - `test1` is called with `i = -1`, the array load proceeds with an out > of bounds index and the crash occurs. > > > `test2` and `test3` are mostly identical except for the check that's > eliminated (a null divisor check) and the instruction that causes a > fault (an integer division). > > The fix I propose is to not eliminate range check `CastII` nodes after > loop opts. When range check`CastII` nodes were introduced, performance > was observed to regress. Removing them after loop opts was found to > preserve both correctness and performance. Today, the performance > regression still exists when `CastII` nodes are left in. So I propose > we keep them until the end of optimizations (so the 2 array loads > above don't lose a dependency and wrongly common) but remove them at > the end of all optimizations. > > In the case of the array loads, they are dependent on a range check > for another array through a range check `CastII` and we must not lose > that dependency otherwise the array loads could float above the range > check at gcm time. I propose we deal with that problem the way it's > handled for `CastPP` nodes: add the dependency to the load (or > division)nodes as a precedence edge when the cast is removed. > > @TobiHartmann ran performance testing for that patch (Thanks!) and reported > no regression. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - review - Merge branch 'master' into JDK-8324517 - Merge branch 'master' into JDK-8324517 - review - Merge branch 'master' into JDK-8324517 - test and fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18377/files - new: https://git.openjdk.org/jdk/pull/18377/files/0de61cbc..ceb30c19 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18377&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18377&range=01-02 Stats: 115362 lines in 3036 files changed: 52226 ins; 47924 del; 15212 mod Patch: https://git.openjdk.org/jdk/pull/18377.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18377/head:pull/18377 PR: https://git.openjdk.org/jdk/pull/18377 From roland at openjdk.org Fri May 3 10:11:25 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 3 May 2024 10:11:25 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v4] In-Reply-To: References: Message-ID: > Range check `CastII` nodes are removed once loop opts are over. The > test case for this change includes 3 cases where elimination of a > range check `CastII` causes a crash in compiled code because either a > out of bounds array load or a division by zero happen. > > In `test1`: > > - the range checks for the `array[otherArray.length]` loads constant > fold: `otherArray.length` is a `CastII` of i at the `otherArray` > allocation. `i` is less than 9. The `CastII` at the allocation > narrows the type down further to `[0-9]`. > > - the `array[otherArray.length]` loads are control dependent on the > unrelated: > > > if (flag == 0) { > > > test. There's an identical dominating test which replaces that one. As > a consequence, the `array[otherArray.length]` loads become control > dependent on the dominating test. > > - The `CastII` nodes at the `otherArray` allocations are replaced by a > dominating range check `CastII` nodes for: > > > newArray[i] = 42; > > > - After loop opts, the range check `CastII` nodes are removed and the > 2 `array[otherArray.length]` loads common at the first: > > > if (flag == 0) { > > > test before the: > > > float[] otherArray = new float[i]; > > > and > > > newArray[i] = 42; > > > that guarantee `i` is positive. > > - `test1` is called with `i = -1`, the array load proceeds with an out > of bounds index and the crash occurs. > > > `test2` and `test3` are mostly identical except for the check that's > eliminated (a null divisor check) and the instruction that causes a > fault (an integer division). > > The fix I propose is to not eliminate range check `CastII` nodes after > loop opts. When range check`CastII` nodes were introduced, performance > was observed to regress. Removing them after loop opts was found to > preserve both correctness and performance. Today, the performance > regression still exists when `CastII` nodes are left in. So I propose > we keep them until the end of optimizations (so the 2 array loads > above don't lose a dependency and wrongly common) but remove them at > the end of all optimizations. > > In the case of the array loads, they are dependent on a range check > for another array through a range check `CastII` and we must not lose > that dependency otherwise the array loads could float above the range > check at gcm time. I propose we deal with that problem the way it's > handled for `CastPP` nodes: add the dependency to the load (or > division)nodes as a precedence edge when the cast is removed. > > @TobiHartmann ran performance testing for that patch (Thanks!) and reported > no regression. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: test fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18377/files - new: https://git.openjdk.org/jdk/pull/18377/files/ceb30c19..5cc658b6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18377&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18377&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18377.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18377/head:pull/18377 PR: https://git.openjdk.org/jdk/pull/18377 From roland at openjdk.org Fri May 3 10:15:54 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 3 May 2024 10:15:54 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v4] In-Reply-To: References: Message-ID: On Fri, 3 May 2024 10:11:25 GMT, Roland Westrelin wrote: >> Range check `CastII` nodes are removed once loop opts are over. The >> test case for this change includes 3 cases where elimination of a >> range check `CastII` causes a crash in compiled code because either a >> out of bounds array load or a division by zero happen. >> >> In `test1`: >> >> - the range checks for the `array[otherArray.length]` loads constant >> fold: `otherArray.length` is a `CastII` of i at the `otherArray` >> allocation. `i` is less than 9. The `CastII` at the allocation >> narrows the type down further to `[0-9]`. >> >> - the `array[otherArray.length]` loads are control dependent on the >> unrelated: >> >> >> if (flag == 0) { >> >> >> test. There's an identical dominating test which replaces that one. As >> a consequence, the `array[otherArray.length]` loads become control >> dependent on the dominating test. >> >> - The `CastII` nodes at the `otherArray` allocations are replaced by a >> dominating range check `CastII` nodes for: >> >> >> newArray[i] = 42; >> >> >> - After loop opts, the range check `CastII` nodes are removed and the >> 2 `array[otherArray.length]` loads common at the first: >> >> >> if (flag == 0) { >> >> >> test before the: >> >> >> float[] otherArray = new float[i]; >> >> >> and >> >> >> newArray[i] = 42; >> >> >> that guarantee `i` is positive. >> >> - `test1` is called with `i = -1`, the array load proceeds with an out >> of bounds index and the crash occurs. >> >> >> `test2` and `test3` are mostly identical except for the check that's >> eliminated (a null divisor check) and the instruction that causes a >> fault (an integer division). >> >> The fix I propose is to not eliminate range check `CastII` nodes after >> loop opts. When range check`CastII` nodes were introduced, performance >> was observed to regress. Removing them after loop opts was found to >> preserve both correctness and performance. Today, the performance >> regression still exists when `CastII` nodes are left in. So I propose >> we keep them until the end of optimizations (so the 2 array loads >> above don't lose a dependency and wrongly common) but remove them at >> the end of all optimizations. >> >> In the case of the array loads, they are dependent on a range check >> for another array through a range check `CastII` and we must not lose >> that dependency otherwise the array loads could float above the range >> check at gcm time. I propose we deal with that problem the way it's >> handled for `CastPP` nodes: add the dependency to the load (or >> division)nodes ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > test fix Thanks for reviewing this. > Did you check if the other usages of `_range_check_dependency` via `CastIINode::has_range_check` are still needed? Seems to me as if at least the checks in `PhaseIdealLoop::match_fill_loop` can be removed. I did but was fairly conservative. In the case of `PhaseIdealLoop::match_fill_loop`, I don't think this change makes a difference: if we don't need the check for `CastIINode::has_range_check` there then it's true with or without that change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18377#issuecomment-2092708764 From roland at openjdk.org Fri May 3 10:20:54 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 3 May 2024 10:20:54 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 11:44:27 GMT, Tobias Hartmann wrote: > `Op_ModI` and `Op_ModL` are missing here. Good catch! I added test cases for `Op_ModI` and `Op_ModL` , the unsigned variants and the also the DivMod variants. I also fixed the patch so it handles all of them. > And isn't this too strong in cases where we can prove that the operand is non-zero? I don't think it's too string. The operand can be non zero because of a range check `CastII` somewhere along the subgraph that starts at the node's second input. In that case, `PhaseIterGVN::no_dependent_zero_check` would return true but removing the range `CastII` would cause the bugs that are triggered by the test case. > Looking at `PhaseIterGVN::no_dependent_zero_check`, I noticed that `UDiv[I/L]Node` and `UMod[I/L]Node` are not handled but I think they should. I think this was missed when these nodes where added by [JDK-8282221](https://bugs.openjdk.org/browse/JDK-8282221). One can probably extend @chhagedorn's test from [JDK-8259227](https://bugs.openjdk.org/browse/JDK-8259227) to trigger the same issue. That seems like a different problem that out of the scope of this particular issue. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18377#discussion_r1589017668 From bkilambi at openjdk.org Fri May 3 11:27:53 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 3 May 2024 11:27:53 GMT Subject: RFR: 8331400: AArch64: Sync aarch64_vector.ad with aarch64_vector_ad.m4 In-Reply-To: References: Message-ID: On Fri, 3 May 2024 09:07:25 GMT, Bhavana Kilambi wrote: > This commit - [1] modified the aarch64_vector.ad directly. This patch includes that change in the aarch64_vector_ad.m4 file as well and generates the aarch64_vector.ad file from it. > > [1] https://github.com/openjdk/jdk/commit/185e711bfe4c4d013b56e867f85cfb4177b3a2cf Thanks for the review. This rarely happens though. I shouldn't have missed this. Can I integrate it or shall I wait for another review (as we need two reviews these days but this one is trivial)? Will still wait for all the tests on macos to pass. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19077#issuecomment-2092812049 From roland at openjdk.org Fri May 3 12:47:56 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 3 May 2024 12:47:56 GMT Subject: RFR: 8305638: Renaming and small clean-ups around predicates [v7] In-Reply-To: <-UU0jrN33Dxbp9EJ9u1FSJ2RDYC02JMK84gnzZLUhSg=.0e20361b-81a2-4ae9-a320-70f3cd9804c6@github.com> References: <-UU0jrN33Dxbp9EJ9u1FSJ2RDYC02JMK84gnzZLUhSg=.0e20361b-81a2-4ae9-a320-70f3cd9804c6@github.com> Message-ID: <2KKv46jcuYFUBM7b-zaZsL_KTEa77P94D5A5fwKAWtY=.56141dde-4f1d-4d09-a7ce-a220a6a699eb@github.com> On Thu, 2 May 2024 10:40:08 GMT, Christian Hagedorn wrote: >> **Update: April 22** >> >> After splitting off and integrating the following PRs from this PR: >> https://github.com/openjdk/jdk/pull/18080 >> https://github.com/openjdk/jdk/pull/18293 >> https://github.com/openjdk/jdk/pull/18628 >> https://github.com/openjdk/jdk/pull/18723 >> >> we are only left with a few renaming and clean-ups from this PR. Directly merging the master branch in was quite hard. I therefore reverted all commits to get back to a clean master and then applied all remaining code changes manually (required a force push). >> >>
>>
>> >> _------------ Original PR description --------------_ >> >> This patch is intended for JDK 23. >> >> While preparing the patch for the full fix for Assertion Predicates [JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981), I still noticed that some changes are not required for the actual fix and could be split off and reviewed separately in this PR. >> >> The patch applies the following cleanup changes: >> - The complete fix had to add slightly different cloning cases in `PhaseIdealLoop::create_bool_from_template_assertion_predicate()` which already has quite some logic to switch between different cases. Additionally, the algorithm in the method itself was already hard to understand and difficult to adapt. I therefore re-implemented it in a separate class `CloneTemplateAssertionPredicateBool` together with some helper classes like `DFSNodeStack`. To use it, I've added a `TemplateAssertionPredicateBool` class that offers three cloning possibilities: >> - `clone()`: Clone without modification >> - `clone_and_replace_opaque_loop_nodes()`: Clone and replace the `OpaqueLoop*Nodes` with a new init and stride node. >> - `clone_and_replace_init()`: Special case of `clone_and_replace_opaque_loop_nodes()` which only replaces `OpaqueLoopInitNode` and clones `OpaqueLoopStrideNode`. >> >> This refactoring could be extracted from the complete fix. >> - The Split If code to detect (`subgraph_has_opaque()`) and clone Template Assertion Predicate Bools was extracted to a separate class `CloneTemplateAssertionPredicateBoolDown` and uses the new `TemplateAssertionPredicateBool` class to do the actual cloning. >> - In the process of coding the complete fix, I've refactored the Loop Unswitching code quite a bit. This change could also be extracted into a separate RFE. Changes include: >> - Renaming >> - Extracting code to separate classes/methods >> - Adding comments >> - Some small refactoring including: >> - Removi... > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into JDK-8305638 > - Merge branch 'refs/heads/master' into JDK-8305638 > > # Conflicts: > # src/hotspot/share/opto/loopPredicate.cpp > - Fix useful Template Assertion Predicate marking > - Fix useful Parse Predicate marking > - Remaining renaming and small clean-ups Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16877#pullrequestreview-2038064110 From roland at openjdk.org Fri May 3 12:51:19 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 3 May 2024 12:51:19 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop Message-ID: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> In the test case: long i; for (; i > 0; i--) { res += 42 / ((int) i); The long counted loop phi has type `[1..100]`. As a consequence, the `ConvL2I` also has type `[1..100]`. The `DivI` node that follows can't fault: it is not guarded by a zero check and has no control set. The `ConvL2I` is split through phi and so is the `DiVI` node: `PhaseIdealLoop::cannot_split_division()` returns true because the value coming from the backedge into the `DivI` (when it is about to be split thru phi) is the result of the `ConvL2I` which has type `[1..100`] so is not zero as far as the compiler can tell. On the last iteration of the loop, i is 1. Because the DivI was split thru Phi, it computes the value for the following iteration, so for i = 0. This causes a crash when the compiled code runs. The same problem can't happen with an int counted loop because logic in `PhaseIdealLoop::split_thru_phi()` prevents a `ConvI2L` from being split thru phi. I propose to fix this the same way: in the test case, it's not true that once the `ConvL2I` is split thru phi it keeps type `[1..100]`. The fix is fairly conservative because it's base on the existing logic for `ConvI2L`: we would want to not split a `ConvL2I` only a counted loopd but. I suppose the same is true for the `ConvI2L` and I thought it would be best to revisit both together. ------------- Commit messages: - test and fix Changes: https://git.openjdk.org/jdk/pull/19086/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19086&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331575 Stats: 68 lines in 2 files changed: 66 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19086.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19086/head:pull/19086 PR: https://git.openjdk.org/jdk/pull/19086 From jbhateja at openjdk.org Fri May 3 14:07:02 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 3 May 2024 14:07:02 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v13] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 20:31:17 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: > > - update for egpr use: bzhil(R,R,R), btq(R,R), btq(R,imm) > - Merge branch 'master' into apx-encoding-pr > - Update full name > - simplification and fix asserts in ldmxcsr, stmxcsr, and emit_prefix_and_int8 > - remove is_map1 comment for addb, andb, movb, orb, testb, xchgb, xorb > - fix stmxcrs REX2 branch, add asserts to SHA instructions > - fixes: pp bits in crc32, REX2 branch in ldmxcsr > - add egpr support for popcntq(R,A), cvttsd2siq(R,A), popq(R) > - fix 4 more src_is_gpr = true cases, add asserts to check for UseAPX > - fix is_gpr arg on two functions with reversed src / dst operands > - ... and 10 more: https://git.openjdk.org/jdk/compare/6985920c...7b3e8ec7 src/hotspot/cpu/x86/assembler_x86.cpp line 2839: > 2837: void Assembler::kmovwl(KRegister dst, KRegister src) { > 2838: assert(VM_Version::supports_evex(), ""); > 2839: InstructionAttr attributes(AVX_128bit, /* rex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); Suggestion: InstructionAttr attributes(AVX_128bit, /* rex_w */ false, /* legacy_mode */ true, /* no_mask_reg */ true, /* uses_vl */ false); No GPR operand here. src/hotspot/cpu/x86/assembler_x86.cpp line 2846: > 2844: void Assembler::kmovdl(KRegister dst, Register src) { > 2845: assert(VM_Version::supports_avx512bw(), ""); > 2846: InstructionAttr attributes(AVX_128bit, /* rex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); FTR, We are doing a legacy demotions in downstream code after checking actual register encoding. src/hotspot/cpu/x86/assembler_x86.cpp line 2860: > 2858: void Assembler::kmovql(KRegister dst, KRegister src) { > 2859: assert(VM_Version::supports_avx512bw(), ""); > 2860: InstructionAttr attributes(AVX_128bit, /* rex_w */ true, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); Suggestion: InstructionAttr attributes(AVX_128bit, /* rex_w */ true, /* legacy_mode */ true, /* no_mask_reg */ true, /* uses_vl */ false); src/hotspot/cpu/x86/assembler_x86.cpp line 6556: > 6554: assert(VM_Version::supports_bmi1(), "tzcnt instruction not supported"); > 6555: emit_int8((unsigned char)0xF3); > 6556: int encode = prefixq_and_encode(dst->encoding(), src->encoding(), true /* is_map1 */); FTR, Quoting relevant except from section 3.1.2.1 of APX specification. ?REX2 must be the last prefix. The byte following it is interpreted as the main opcode byte in the opcode map indicated by M0. The 0x0F escape byte is neither needed nor allowed.? src/hotspot/cpu/x86/assembler_x86.hpp line 536: > 534: REXBIT_X = 0x02, > 535: REXBIT_R = 0x04, > 536: REXBIT_W = 0x08, Suggestion: REX2BIT_B = 0x01, REX2BIT_X = 0x02, REX2BIT_R = 0x04, REX2BIT_W = 0x08, Name change suggestion since these bits are part of REX2 prefix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1589185356 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1589195694 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1589193598 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1589207336 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1588761975 From jbhateja at openjdk.org Fri May 3 14:17:02 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 3 May 2024 14:17:02 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v13] In-Reply-To: References: Message-ID: <8DYTq-UlK3eJ0rZIqZODihapcSTUgO0ExgAeN9tGQ8A=.140f1bfc-5fe8-4d7b-9162-c5c332fbd292@github.com> On Thu, 2 May 2024 20:31:17 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: > > - update for egpr use: bzhil(R,R,R), btq(R,R), btq(R,imm) > - Merge branch 'master' into apx-encoding-pr > - Update full name > - simplification and fix asserts in ldmxcsr, stmxcsr, and emit_prefix_and_int8 > - remove is_map1 comment for addb, andb, movb, orb, testb, xchgb, xorb > - fix stmxcrs REX2 branch, add asserts to SHA instructions > - fixes: pp bits in crc32, REX2 branch in ldmxcsr > - add egpr support for popcntq(R,A), cvttsd2siq(R,A), popq(R) > - fix 4 more src_is_gpr = true cases, add asserts to check for UseAPX > - fix is_gpr arg on two functions with reversed src / dst operands > - ... and 10 more: https://git.openjdk.org/jdk/compare/89f2678c...7b3e8ec7 src/hotspot/cpu/x86/assembler_x86.cpp line 1726: > 1724: > 1725: void Assembler::blsrl(Register dst, Register src) { > 1726: assert(VM_Version::supports_bmi1(), "bit manipulation instructions not supported"); We should extend assertion checks based on register encodings and feature detection upfront using ` VM_Version::supports_apx_f() ` part of [PR#18562](https://github.com/openjdk/jdk/pull/18562) once it lands OR you can merge that pull request with this patch if you find appropriate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1589269902 From kvn at openjdk.org Fri May 3 16:11:52 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 3 May 2024 16:11:52 GMT Subject: RFR: 8331400: AArch64: Sync aarch64_vector.ad with aarch64_vector_ad.m4 In-Reply-To: References: Message-ID: On Fri, 3 May 2024 09:07:25 GMT, Bhavana Kilambi wrote: > This commit - [1] modified the aarch64_vector.ad directly. This patch includes that change in the aarch64_vector_ad.m4 file as well and generates the aarch64_vector.ad file from it. > > [1] https://github.com/openjdk/jdk/commit/185e711bfe4c4d013b56e867f85cfb4177b3a2cf Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19077#pullrequestreview-2038495372 From never at openjdk.org Fri May 3 17:23:53 2024 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 3 May 2024 17:23:53 GMT Subject: RFR: 8329982: compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/SimpleDebugInfoTest.java failed assert(oopDesc::is_oop_or_null(val)) failed: bad oop found [v3] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 21:35:08 GMT, Doug Simon wrote: >> This PR adds the missing nmethod entry barriers to JVMCI hand assembled tests. >> It also closes the escape hatch in jvmciCodeInstaller.cpp that allowed JVMCI code to be installed without nmethod entry barriers. > > Doug Simon has updated the pull request incrementally with two additional commits since the last revision: > > - fix NativeCallTest on x64 > - remove more vestiges of optional JVMCI nmethod support for entry barriers Sounds and looks good. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19035#pullrequestreview-2038677680 From duke at openjdk.org Fri May 3 19:14:07 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Fri, 3 May 2024 19:14:07 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v14] In-Reply-To: References: Message-ID: <727FyZHyBbtRilYRtbP2E4dbZYqj9a-QgXAuicQ2iZQ=.01035706-6591-4df5-bf7d-d7a2f6209015@github.com> > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: revert unneeded legacy flag change for kmovwl(K,K) and kmovql(K,K) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/7b3e8ec7..d93e9893 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=12-13 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From duke at openjdk.org Fri May 3 19:14:11 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Fri, 3 May 2024 19:14:11 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v13] In-Reply-To: <8DYTq-UlK3eJ0rZIqZODihapcSTUgO0ExgAeN9tGQ8A=.140f1bfc-5fe8-4d7b-9162-c5c332fbd292@github.com> References: <8DYTq-UlK3eJ0rZIqZODihapcSTUgO0ExgAeN9tGQ8A=.140f1bfc-5fe8-4d7b-9162-c5c332fbd292@github.com> Message-ID: On Fri, 3 May 2024 14:14:28 GMT, Jatin Bhateja wrote: >> Steve Dohrmann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: >> >> - update for egpr use: bzhil(R,R,R), btq(R,R), btq(R,imm) >> - Merge branch 'master' into apx-encoding-pr >> - Update full name >> - simplification and fix asserts in ldmxcsr, stmxcsr, and emit_prefix_and_int8 >> - remove is_map1 comment for addb, andb, movb, orb, testb, xchgb, xorb >> - fix stmxcrs REX2 branch, add asserts to SHA instructions >> - fixes: pp bits in crc32, REX2 branch in ldmxcsr >> - add egpr support for popcntq(R,A), cvttsd2siq(R,A), popq(R) >> - fix 4 more src_is_gpr = true cases, add asserts to check for UseAPX >> - fix is_gpr arg on two functions with reversed src / dst operands >> - ... and 10 more: https://git.openjdk.org/jdk/compare/dc7f6595...7b3e8ec7 > > src/hotspot/cpu/x86/assembler_x86.cpp line 1726: > >> 1724: >> 1725: void Assembler::blsrl(Register dst, Register src) { >> 1726: assert(VM_Version::supports_bmi1(), "bit manipulation instructions not supported"); > > We should extend assertion checks based on register encodings and feature detection upfront using ` VM_Version::supports_apx_f() ` part of [PR#18562](https://github.com/openjdk/jdk/pull/18562) once it lands OR you can merge that pull request with this patch if you find appropriate. Agree that asserts should be extended. Maybe it would be better to do so in a subsequent PR with feature detection in place. > src/hotspot/cpu/x86/assembler_x86.cpp line 2839: > >> 2837: void Assembler::kmovwl(KRegister dst, KRegister src) { >> 2838: assert(VM_Version::supports_evex(), ""); >> 2839: InstructionAttr attributes(AVX_128bit, /* rex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > > Suggestion: > > InstructionAttr attributes(AVX_128bit, /* rex_w */ false, /* legacy_mode */ true, /* no_mask_reg */ true, /* uses_vl */ false); > > > No GPR operand here. Thanks, made the change back. > src/hotspot/cpu/x86/assembler_x86.cpp line 2860: > >> 2858: void Assembler::kmovql(KRegister dst, KRegister src) { >> 2859: assert(VM_Version::supports_avx512bw(), ""); >> 2860: InstructionAttr attributes(AVX_128bit, /* rex_w */ true, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > > Suggestion: > > InstructionAttr attributes(AVX_128bit, /* rex_w */ true, /* legacy_mode */ true, /* no_mask_reg */ true, /* uses_vl */ false); Thanks, made the change back. > src/hotspot/cpu/x86/assembler_x86.cpp line 6556: > >> 6554: assert(VM_Version::supports_bmi1(), "tzcnt instruction not supported"); >> 6555: emit_int8((unsigned char)0xF3); >> 6556: int encode = prefixq_and_encode(dst->encoding(), src->encoding(), true /* is_map1 */); > > FTR, Quoting relevant except from section 3.1.2.1 of APX specification. > ?REX2 must be the last prefix. The byte following it is interpreted as the main opcode byte in the opcode map indicated by M0. The 0x0F escape byte is neither needed nor allowed.? Thanks, understand. The prefixq_and_encode function used above does not emit the 0x0F opcode prefix for map1 instructions encoded with the REX2 scheme. > src/hotspot/cpu/x86/assembler_x86.hpp line 536: > >> 534: REXBIT_X = 0x02, >> 535: REXBIT_R = 0x04, >> 536: REXBIT_W = 0x08, > > Suggestion: > > REX2BIT_B = 0x01, > REX2BIT_X = 0x02, > REX2BIT_R = 0x04, > REX2BIT_W = 0x08, > > Name change suggestion since these bits are part of REX2 prefix. It's true that the REXBIT constants are currently only used in REX2 encoding code. The reason for choosing the REXBIT name for those four values was that they do refer to REX encoding bits and, if bit-wise refactoring of the existing REX encoding code was to be done later, the REXBIT names would make more sense there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1589636409 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1589635515 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1589635298 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1589635856 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1589635146 From stuefe at openjdk.org Fri May 3 19:16:54 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 3 May 2024 19:16:54 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v2] In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 18:57:16 GMT, Vladimir Kozlov wrote: >>> Thank you, @tstuefe, for filing these bugs. >>> >>> One additional thing I noticed is that we don't produce compilation replay file (its size is 0) for such failures. Can you look why is that? >> >> Yes, its https://bugs.openjdk.org/browse/JDK-8331344 . I'll post a PR shortly. >> >> The problem behind this is more generic, namely that producing replay files needs resource area, and it shouldn't. We should not allocate resource area or heap in fatal error handling. But for now, I'll fix this locally by avoiding the recursion. > >> > Thank you, @tstuefe, for filing these bugs. >> > One additional thing I noticed is that we don't produce compilation replay file (its size is 0) for such failures. Can you look why is that? >> >> Yes, its https://bugs.openjdk.org/browse/JDK-8331344 . I'll post a PR shortly. >> >> The problem behind this is more generic, namely that producing replay files needs resource area, and it shouldn't. We should not allocate resource area or heap in fatal error handling. But for now, I'll fix this locally by avoiding the recursion. > > Good. I think we need to push it before this PR. @vnkozlov SAP did a test series and did not find any issues in their CI ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2093618707 From duke at openjdk.org Fri May 3 19:40:55 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Fri, 3 May 2024 19:40:55 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers In-Reply-To: References: Message-ID: <2ix8fZdbyXTav2FBERlzl7U6JkI3i9hPFGSNKbrDlpo=.a219b3de-7035-44d0-9bdc-3ea599800eb3@github.com> On Tue, 9 Apr 2024 19:44:26 GMT, Dean Long wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > How can we be confident that the encoding is correct? Would it be possible to write tests for this? Maybe one that disassembles it and compares the result to a 3rd party disassembler offline or in-process hsdis? In response to @dean-long, @theRealAph wrote: > When we wrote the AArch64 port, there was no available hardware to test it on. So, we wrote a simulator to test it. However, we ran the risk that if our understanding of instruction encoding was wrong, our assembler and our simulator might appear to work correctly when used together, but the result would not run on real AArch64 hardware once it arrived. So, as well as a simulator for the architecture, we verified the internal HotSpot assembler by checking its encoding against GNU `as`. See /test/hotspot/gtest/aarch64, where a Python program generates source for both the HotSpot internal assembler and GNU `as`. I strongly suggest you do something similar. (As a matter for the historical record, this did work. The test found several encoding bugs. Once we got the first real AArch64 hardware, the port worked almost immediately.) Thanks for the description. It would be great to create a similar tool for x86. I tested the encoding manually using the SDE as the authoritative source. It is tedious though and very time consuming. A subsequent PR in [JDK-8329030](https://bugs.openjdk.org/browse/JDK-8329030), perhaps the one that adds encoding support for New Data Destination variants, should include such a tool. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2093653696 From dnsimon at openjdk.org Fri May 3 19:55:02 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 3 May 2024 19:55:02 GMT Subject: RFR: 8329982: compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/SimpleDebugInfoTest.java failed assert(oopDesc::is_oop_or_null(val)) failed: bad oop found [v3] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 21:35:08 GMT, Doug Simon wrote: >> This PR adds the missing nmethod entry barriers to JVMCI hand assembled tests. >> It also closes the escape hatch in jvmciCodeInstaller.cpp that allowed JVMCI code to be installed without nmethod entry barriers. > > Doug Simon has updated the pull request incrementally with two additional commits since the last revision: > > - fix NativeCallTest on x64 > - remove more vestiges of optional JVMCI nmethod support for entry barriers Thanks for the feedback and reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19035#issuecomment-2093668944 From dnsimon at openjdk.org Fri May 3 19:55:04 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 3 May 2024 19:55:04 GMT Subject: Integrated: 8329982: compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/SimpleDebugInfoTest.java failed assert(oopDesc::is_oop_or_null(val)) failed: bad oop found In-Reply-To: References: Message-ID: On Wed, 1 May 2024 15:03:08 GMT, Doug Simon wrote: > This PR adds the missing nmethod entry barriers to JVMCI hand assembled tests. > It also closes the escape hatch in jvmciCodeInstaller.cpp that allowed JVMCI code to be installed without nmethod entry barriers. This pull request has now been integrated. Changeset: b20fa7b4 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/b20fa7b48b0f0a64c0760f26188d4c11c3233b61 Stats: 731 lines in 14 files changed: 404 ins; 309 del; 18 mod 8329982: compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/SimpleDebugInfoTest.java failed assert(oopDesc::is_oop_or_null(val)) failed: bad oop found Reviewed-by: never ------------- PR: https://git.openjdk.org/jdk/pull/19035 From kvn at openjdk.org Fri May 3 20:00:55 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 3 May 2024 20:00:55 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v9] In-Reply-To: References: Message-ID: On Fri, 3 May 2024 05:33:17 GMT, Thomas Stuefe wrote: >> See [1] for previous discussions. >> >> We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. >> >> The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. >> >> Examples: >> >> This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` >> >> This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` >> >> >> --- >> >> The patch: >> >> 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. >> 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. >> 3) Adapted and extended tests >> >> I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. >> >> >> Tested: >> >> - manually on Mac m1 (debug and release) >> - GHAs are running >> - but Oracle will do more testing before this goes in >> >> [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: > > - merge master and fix conflicts > - Remove unused variable > - Remove accidental change to TestDeadPhiMergeMemLoop.java > - fix copyrights > - fix copyrights > - another fix > - fix accidental slip in of another test name > - fix jdk note number in test comment > - Disable memory limit for compiler/loopopts/TestDeepGraphVerifyIterativeGVN.java until JDK-8331295 is fixed > - Merge branch 'master' into compiler-default-limit > - ... and 6 more: https://git.openjdk.org/jdk/compare/6bef0474...f6396010 Okay, I will run our testing too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2093678690 From sgibbons at openjdk.org Fri May 3 23:22:31 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 3 May 2024 23:22:31 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v18] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 50 commits: - Merge remote-tracking branch 'origin/master' into indexof - Move arrays_equals back to c2_MacroAssembler - Merge branch 'openjdk:master' into indexof - Remove infinite loop (used for debugging) - Merge branch 'openjdk:master' into indexof - Cleaned up, ready for review - Pre-cleanup code - Add JMH. Add 16-byte compares to arrays_equals - Better method for mask creation - Merge branch 'openjdk:master' into indexof - ... and 40 more: https://git.openjdk.org/jdk/compare/b20fa7b4...f52d281d ------------- Changes: https://git.openjdk.org/jdk/pull/16753/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=17 Stats: 4345 lines in 17 files changed: 4183 ins; 26 del; 136 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From cslucas at openjdk.org Fri May 3 23:43:56 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 3 May 2024 23:43:56 GMT Subject: Integrated: 8330247: C2: CTW fail with assert(adr_t->is_known_instance_field()) failed: instance required In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 00:35:16 GMT, Cesar Soares Lucas wrote: > The logic in reduce allocation merges (RAM) makes use of `PhaseMacroExpand:;can_eliminate_allocation` to check whether an allocation can be scalar replaced. However, we can only SR allocations of exact types - due to rematerialization logic. > > The scalar replacement logic not related to RAM has this check in `split_unique_types` so there is no performance regression by adding this check here. > > Tested on Linux x64 tiers1-3. This pull request has now been integrated. Changeset: 9347bb7d Author: Cesar Soares Lucas Committer: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/9347bb7df845ee465c378c6f511ef8a6caea18ea Stats: 76 lines in 2 files changed: 76 ins; 0 del; 0 mod 8330247: C2: CTW fail with assert(adr_t->is_known_instance_field()) failed: instance required Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/18851 From kvn at openjdk.org Sat May 4 05:32:00 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 4 May 2024 05:32:00 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v9] In-Reply-To: References: Message-ID: <-vdGyJLNkw9M33NtEHJo_YGHfWldStOLI23Dk36Yi8w=.92a6b81b-ed1a-4e77-b657-eab04e219a3e@github.com> On Fri, 3 May 2024 05:33:17 GMT, Thomas Stuefe wrote: >> See [1] for previous discussions. >> >> We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. >> >> The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. >> >> Examples: >> >> This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` >> >> This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` >> >> >> --- >> >> The patch: >> >> 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. >> 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. >> 3) Adapted and extended tests >> >> I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. >> >> >> Tested: >> >> - manually on Mac m1 (debug and release) >> - GHAs are running >> - but Oracle will do more testing before this goes in >> >> [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: > > - merge master and fix conflicts > - Remove unused variable > - Remove accidental change to TestDeadPhiMergeMemLoop.java > - fix copyrights > - fix copyrights > - another fix > - fix accidental slip in of another test name > - fix jdk note number in test comment > - Disable memory limit for compiler/loopopts/TestDeepGraphVerifyIterativeGVN.java until JDK-8331295 is fixed > - Merge branch 'master' into compiler-default-limit > - ... and 6 more: https://git.openjdk.org/jdk/compare/6bef0474...f6396010 Looks like `memlimit,TestFindNode::test,0` does not work. The test failed with stress flags [JDK-8331283](https://bugs.openjdk.org/browse/JDK-8331283) on linux-aarch64 (Ampere). With the same call stack. I see `-XX:CompileCommand=memlimit,TestFindNode::test,0` in flags passed to test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2094029663 From kvn at openjdk.org Sat May 4 05:34:58 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 4 May 2024 05:34:58 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v9] In-Reply-To: References: Message-ID: On Fri, 3 May 2024 05:33:17 GMT, Thomas Stuefe wrote: >> See [1] for previous discussions. >> >> We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. >> >> The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. >> >> Examples: >> >> This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` >> >> This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` >> >> >> --- >> >> The patch: >> >> 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. >> 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. >> 3) Adapted and extended tests >> >> I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. >> >> >> Tested: >> >> - manually on Mac m1 (debug and release) >> - GHAs are running >> - but Oracle will do more testing before this goes in >> >> [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: > > - merge master and fix conflicts > - Remove unused variable > - Remove accidental change to TestDeadPhiMergeMemLoop.java > - fix copyrights > - fix copyrights > - another fix > - fix accidental slip in of another test name > - fix jdk note number in test comment > - Disable memory limit for compiler/loopopts/TestDeepGraphVerifyIterativeGVN.java until JDK-8331295 is fixed > - Merge branch 'master' into compiler-default-limit > - ... and 6 more: https://git.openjdk.org/jdk/compare/6bef0474...f6396010 I attached hs_err file to [JDK-8331283](https://bugs.openjdk.org/browse/JDK-8331283) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2094032227 From stuefe at openjdk.org Sat May 4 08:25:16 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 4 May 2024 08:25:16 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v10] In-Reply-To: References: Message-ID: > See [1] for previous discussions. > > We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. > > The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. > > Examples: > > This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` > > This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` > > > --- > > The patch: > > 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. > 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. > 3) Adapted and extended tests > > I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. > > > Tested: > > - manually on Mac m1 (debug and release) > - GHAs are running > - but Oracle will do more testing before this goes in > > [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: fix compiler.c2.TestFindNode again ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18969/files - new: https://git.openjdk.org/jdk/pull/18969/files/f6396010..695a0096 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18969&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18969&range=08-09 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18969/head:pull/18969 PR: https://git.openjdk.org/jdk/pull/18969 From stuefe at openjdk.org Sat May 4 08:28:02 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 4 May 2024 08:28:02 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v9] In-Reply-To: <-vdGyJLNkw9M33NtEHJo_YGHfWldStOLI23Dk36Yi8w=.92a6b81b-ed1a-4e77-b657-eab04e219a3e@github.com> References: <-vdGyJLNkw9M33NtEHJo_YGHfWldStOLI23Dk36Yi8w=.92a6b81b-ed1a-4e77-b657-eab04e219a3e@github.com> Message-ID: On Sat, 4 May 2024 05:29:01 GMT, Vladimir Kozlov wrote: > Looks like `memlimit,TestFindNode::test,0` does not work. The test failed with stress flags [JDK-8331283](https://bugs.openjdk.org/browse/JDK-8331283) on linux-aarch64 (Ampere). With the same call stack. I see `-XX:CompileCommand=memlimit,TestFindNode::test,0` in flags passed to test. I fixed the error, a simple typo (forgot to properly name the class in the option). Retested locally on Mac m1, confirmed that the test passes with this commit, fails without it. I am not sure what went wrong, since I did these tests beforehand. Maybe I pushed the wrong version. That is slightly concerning, however, since the error should have come up at SAP too. I guess they don't test with all these stress options. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2094076704 From aph at openjdk.org Sat May 4 08:59:51 2024 From: aph at openjdk.org (Andrew Haley) Date: Sat, 4 May 2024 08:59:51 GMT Subject: RFR: 8331400: AArch64: Sync aarch64_vector.ad with aarch64_vector_ad.m4 In-Reply-To: References: Message-ID: <3xMwZQNT3DeAV1usUi-YyhRNoF8oxWNtUIHoB3eSEPw=.6f4fcb8a-731a-49bd-bc2a-571b4fd90ec9@github.com> On Fri, 3 May 2024 11:24:56 GMT, Bhavana Kilambi wrote: > Thanks for the review. This rarely happens though. I shouldn't have missed this. Can I integrate it or shall I wait for another review (as we need two reviews these days but this one is trivial)? Will still wait for all the tests on macos to pass. Just push it. @vnkozlov has acked it now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19077#issuecomment-2094086457 From kvn at openjdk.org Sat May 4 18:31:53 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 4 May 2024 18:31:53 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v10] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 08:25:16 GMT, Thomas Stuefe wrote: >> See [1] for previous discussions. >> >> We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. >> >> The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. >> >> Examples: >> >> This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` >> >> This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` >> >> >> --- >> >> The patch: >> >> 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. >> 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. >> 3) Adapted and extended tests >> >> I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. >> >> >> Tested: >> >> - manually on Mac m1 (debug and release) >> - GHAs are running >> - but Oracle will do more testing before this goes in >> >> [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > fix compiler.c2.TestFindNode again `-XX:CompileCommand=memstat,compiler.c2.TestFindNode::*,print` - leftover from debugging? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2094340138 From sgibbons at openjdk.org Sat May 4 19:35:21 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Sat, 4 May 2024 19:35:21 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Rearrange; add lambdas for clarity ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/f52d281d..fb4da92a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=17-18 Stats: 2561 lines in 1 file changed: 804 ins; 954 del; 803 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From eliu at openjdk.org Sun May 5 00:46:53 2024 From: eliu at openjdk.org (Eric Liu) Date: Sun, 5 May 2024 00:46:53 GMT Subject: RFR: 8331400: AArch64: Sync aarch64_vector.ad with aarch64_vector_ad.m4 In-Reply-To: References: Message-ID: On Fri, 3 May 2024 09:07:25 GMT, Bhavana Kilambi wrote: > This commit - [1] modified the aarch64_vector.ad directly. This patch includes that change in the aarch64_vector_ad.m4 file as well and generates the aarch64_vector.ad file from it. > > [1] https://github.com/openjdk/jdk/commit/185e711bfe4c4d013b56e867f85cfb4177b3a2cf Marked as reviewed by eliu (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19077#pullrequestreview-2039566564 From fyang at openjdk.org Mon May 6 04:53:58 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 6 May 2024 04:53:58 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic [v3] In-Reply-To: <74Np8LGxo8PiyoLAUI7tUlAq7ySVgmGzblZio5Tlhx8=.c0fdf457-bc2f-4f88-a070-325521b469f9@github.com> References: <74Np8LGxo8PiyoLAUI7tUlAq7ySVgmGzblZio5Tlhx8=.c0fdf457-bc2f-4f88-a070-325521b469f9@github.com> Message-ID: On Fri, 3 May 2024 06:47:10 GMT, Roberto Casta?eda Lozano wrote: >> This changeset generalizes the logic to analyze, declare, and communicate which registers are live at a C2 barrier stub so that it can be used by other collectors than ZGC adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). >> >> The main changes are: >> >> - Make it possible to compute register liveness information before (live-in) or after (live-out) each barrier, and let the collector choose by implementing `BarrierSetC2State::needs_livein_data()`. >> >> - Generalize the interface with which collectors declare which registers must be additionally preserved across barrier runtime calls, adding the methods `BarrierStubC2::preserve(Register r)` and `BarrierStubC2::dont_preserve(Register r)`. >> >> - Simplify the interface with which platform-specific logic computes which registers to preserve across barrier runtime calls, replacing the calls to `BarrierStubC2::result()` and `BarrierStubC2::live()` with a single call to `BarrierStubC2::preserve_set()`. >> >> #### Testing >> >> - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> - tier1-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/4d4e743d8f4cddd5288cee1d69c70fe2b9bea066) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. >> - Build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug). @RealFYang, @TheRealMDoerr: could you please test and review the riscv and ppc changes? Thanks! > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Apply code style suggestions from Axel > > Co-authored-by: Axel Boldt-Christmas @robcasloz : This also tests good on linux-riscv64 platform. LGTM. Thanks for the ping! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19026#pullrequestreview-2039978152 From epeter at openjdk.org Mon May 6 06:30:07 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 6 May 2024 06:30:07 GMT Subject: RFR: 8329273: C2 SuperWord: Some basic MemorySegment IR tests Message-ID: I could not find any IR vectorization tests for `MemorySegment` yet. I make sure to exercise different backing types: - arrays - buffers - native memory I filed a follow-up RFE, to eventually make all cases where I have "FAILS" vectorize: [JDK-8331659](https://bugs.openjdk.org/browse/JDK-8331659): C2 SuperWord: investicate failed vectorization in compiler/loopopts/superword/TestMemorySegment.java ------------- Commit messages: - fix tabs - speed up test - small cosmetic fix - make things static - long loop tests - handle AlignVector - int cases - int-index case - disable mixed tests - mixed - ... and 14 more: https://git.openjdk.org/jdk/compare/ea3909ac...b6f16a58 Changes: https://git.openjdk.org/jdk/pull/18535/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18535&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8329273 Stats: 860 lines in 1 file changed: 860 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18535.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18535/head:pull/18535 PR: https://git.openjdk.org/jdk/pull/18535 From duke at openjdk.org Mon May 6 06:36:05 2024 From: duke at openjdk.org (Daniel Skantz) Date: Mon, 6 May 2024 06:36:05 GMT Subject: RFR: 8330016: Stress seed should be initialized for runtime stub compilation Message-ID: <-Pj7vK6Wpgv3FnP2KZXB2so2wWraG3LWEKds9XDI3pY=.d6595586-aa83-47bc-84ed-ff8c4a1a5550@github.com> We can initialize the stress seed for runtime stub compilation as we already do for method compilation. This found the bug described in JDK-8329258. It would apply if StressGCM or StressLCM vm flags are set. Testing: T1-5 default options. T1-5 with -XX:+StressLCM and -XX:+StressGCM. Manually tested that the stress seed is set and printed to compilation log if either stress option is set. ------------- Commit messages: - move lines again - move lines - factor out streed seed initialization - add stress seed in runtime stub Changes: https://git.openjdk.org/jdk/pull/19095/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19095&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330016 Stats: 31 lines in 2 files changed: 20 ins; 10 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19095.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19095/head:pull/19095 PR: https://git.openjdk.org/jdk/pull/19095 From rcastanedalo at openjdk.org Mon May 6 07:33:52 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 6 May 2024 07:33:52 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic [v2] In-Reply-To: References: Message-ID: On Fri, 3 May 2024 06:44:42 GMT, Roberto Casta?eda Lozano wrote: >> lgtm. >> >> A few nits. > >> lgtm. >> >> A few nits. > > Thanks for reviewing and for the style suggestions, Axel! > @robcasloz : This also tests good on linux-riscv64 platform. LGTM. Thanks for the ping! Thanks for testing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19026#issuecomment-2095358512 From chagedorn at openjdk.org Mon May 6 07:38:53 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 6 May 2024 07:38:53 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop In-Reply-To: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> Message-ID: On Fri, 3 May 2024 12:33:43 GMT, Roland Westrelin wrote: > In the test case: > > > long i; > for (; i > 0; i--) { > res += 42 / ((int) i); > > > The long counted loop phi has type `[1..100]`. As a consequence, the > `ConvL2I` also has type `[1..100]`. The `DivI` node that follows can't > fault: it is not guarded by a zero check and has no control set. > > The `ConvL2I` is split through phi and so is the `DiVI` node: > `PhaseIdealLoop::cannot_split_division()` returns true because the > value coming from the backedge into the `DivI` (when it is about to be > split thru phi) is the result of the `ConvL2I` which has type > `[1..100`] so is not zero as far as the compiler can tell. > > On the last iteration of the loop, i is 1. Because the DivI was split > thru Phi, it computes the value for the following iteration, so for i > = 0. This causes a crash when the compiled code runs. > > The same problem can't happen with an int counted loop because logic > in `PhaseIdealLoop::split_thru_phi()` prevents a `ConvI2L` from being > split thru phi. I propose to fix this the same way: in the test case, > it's not true that once the `ConvL2I` is split thru phi it keeps type > `[1..100]`. The fix is fairly conservative because it's base on the > existing logic for `ConvI2L`: we would want to not split a `ConvL2I` > only a counted loopd but. I suppose the same is true for the `ConvI2L` > and I thought it would be best to revisit both together. You could also add the regression tests from the duplicated issue [JDK-8298851](https://bugs.openjdk.org/browse/JDK-8298851). Marked as reviewed by chagedorn (Reviewer). src/hotspot/share/opto/loopopts.cpp line 54: > 52: if ((n->Opcode() == Op_ConvI2L && n->bottom_type() != TypeLong::LONG) || > 53: (n->Opcode() == Op_ConvL2I && n->bottom_type() != TypeInt::INT)) { > 54: // ConvI2L/ConvL2I may have type information on it which is unsafe to push up The fix looks good and we should probably move forward with that. But I'm still wondering though, if these bailouts are really needed in the general case. It seems like this problem is mainly for loop phis. Couldn't we check the types of loop phi inputs and bail out if one includes zero? IIUC, the backedge should be an `AddL` with type `[0..99]`, i.e. post-decremented. So, pushing through seems wrong in this case since the backedge type includes zero. But it could be detected and prevented. However, if the phi has type `[5..100]`, for example, then it should be safe. We probably then just need to update the type of the pushed-through `ConvL2I` to whatever the type of the input is. This type checking approach could work in the general case. But I'm not sure though, if it's beneficial to split these `Conv` nodes through phis in general. But it seems the bailouts have only been introduced due to correctness bugs and not due to performance reasons. Anyway, this should be investigated separately, including benchmarking. ------------- PR Review: https://git.openjdk.org/jdk/pull/19086#pullrequestreview-2040163524 PR Review: https://git.openjdk.org/jdk/pull/19086#pullrequestreview-2040170877 PR Review Comment: https://git.openjdk.org/jdk/pull/19086#discussion_r1590639677 From chagedorn at openjdk.org Mon May 6 07:51:00 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 6 May 2024 07:51:00 GMT Subject: RFR: 8305638: Renaming and small clean-ups around predicates [v7] In-Reply-To: <-UU0jrN33Dxbp9EJ9u1FSJ2RDYC02JMK84gnzZLUhSg=.0e20361b-81a2-4ae9-a320-70f3cd9804c6@github.com> References: <-UU0jrN33Dxbp9EJ9u1FSJ2RDYC02JMK84gnzZLUhSg=.0e20361b-81a2-4ae9-a320-70f3cd9804c6@github.com> Message-ID: On Thu, 2 May 2024 10:40:08 GMT, Christian Hagedorn wrote: >> **Update: April 22** >> >> After splitting off and integrating the following PRs from this PR: >> https://github.com/openjdk/jdk/pull/18080 >> https://github.com/openjdk/jdk/pull/18293 >> https://github.com/openjdk/jdk/pull/18628 >> https://github.com/openjdk/jdk/pull/18723 >> >> we are only left with a few renaming and clean-ups from this PR. Directly merging the master branch in was quite hard. I therefore reverted all commits to get back to a clean master and then applied all remaining code changes manually (required a force push). >> >>
>>
>> >> _------------ Original PR description --------------_ >> >> This patch is intended for JDK 23. >> >> While preparing the patch for the full fix for Assertion Predicates [JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981), I still noticed that some changes are not required for the actual fix and could be split off and reviewed separately in this PR. >> >> The patch applies the following cleanup changes: >> - The complete fix had to add slightly different cloning cases in `PhaseIdealLoop::create_bool_from_template_assertion_predicate()` which already has quite some logic to switch between different cases. Additionally, the algorithm in the method itself was already hard to understand and difficult to adapt. I therefore re-implemented it in a separate class `CloneTemplateAssertionPredicateBool` together with some helper classes like `DFSNodeStack`. To use it, I've added a `TemplateAssertionPredicateBool` class that offers three cloning possibilities: >> - `clone()`: Clone without modification >> - `clone_and_replace_opaque_loop_nodes()`: Clone and replace the `OpaqueLoop*Nodes` with a new init and stride node. >> - `clone_and_replace_init()`: Special case of `clone_and_replace_opaque_loop_nodes()` which only replaces `OpaqueLoopInitNode` and clones `OpaqueLoopStrideNode`. >> >> This refactoring could be extracted from the complete fix. >> - The Split If code to detect (`subgraph_has_opaque()`) and clone Template Assertion Predicate Bools was extracted to a separate class `CloneTemplateAssertionPredicateBoolDown` and uses the new `TemplateAssertionPredicateBool` class to do the actual cloning. >> - In the process of coding the complete fix, I've refactored the Loop Unswitching code quite a bit. This change could also be extracted into a separate RFE. Changes include: >> - Renaming >> - Extracting code to separate classes/methods >> - Adding comments >> - Some small refactoring including: >> - Removi... > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into JDK-8305638 > - Merge branch 'refs/heads/master' into JDK-8305638 > > # Conflicts: > # src/hotspot/share/opto/loopPredicate.cpp > - Fix useful Template Assertion Predicate marking > - Fix useful Parse Predicate marking > - Remaining renaming and small clean-ups Thanks Roland for re-reviewing it! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16877#issuecomment-2095381632 From chagedorn at openjdk.org Mon May 6 07:51:00 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 6 May 2024 07:51:00 GMT Subject: Integrated: 8305638: Renaming and small clean-ups around predicates In-Reply-To: References: Message-ID: On Wed, 29 Nov 2023 08:42:41 GMT, Christian Hagedorn wrote: > **Update: April 22** > > After splitting off and integrating the following PRs from this PR: > https://github.com/openjdk/jdk/pull/18080 > https://github.com/openjdk/jdk/pull/18293 > https://github.com/openjdk/jdk/pull/18628 > https://github.com/openjdk/jdk/pull/18723 > > we are only left with a few renaming and clean-ups from this PR. Directly merging the master branch in was quite hard. I therefore reverted all commits to get back to a clean master and then applied all remaining code changes manually (required a force push). > >
>
> > _------------ Original PR description --------------_ > > This patch is intended for JDK 23. > > While preparing the patch for the full fix for Assertion Predicates [JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981), I still noticed that some changes are not required for the actual fix and could be split off and reviewed separately in this PR. > > The patch applies the following cleanup changes: > - The complete fix had to add slightly different cloning cases in `PhaseIdealLoop::create_bool_from_template_assertion_predicate()` which already has quite some logic to switch between different cases. Additionally, the algorithm in the method itself was already hard to understand and difficult to adapt. I therefore re-implemented it in a separate class `CloneTemplateAssertionPredicateBool` together with some helper classes like `DFSNodeStack`. To use it, I've added a `TemplateAssertionPredicateBool` class that offers three cloning possibilities: > - `clone()`: Clone without modification > - `clone_and_replace_opaque_loop_nodes()`: Clone and replace the `OpaqueLoop*Nodes` with a new init and stride node. > - `clone_and_replace_init()`: Special case of `clone_and_replace_opaque_loop_nodes()` which only replaces `OpaqueLoopInitNode` and clones `OpaqueLoopStrideNode`. > > This refactoring could be extracted from the complete fix. > - The Split If code to detect (`subgraph_has_opaque()`) and clone Template Assertion Predicate Bools was extracted to a separate class `CloneTemplateAssertionPredicateBoolDown` and uses the new `TemplateAssertionPredicateBool` class to do the actual cloning. > - In the process of coding the complete fix, I've refactored the Loop Unswitching code quite a bit. This change could also be extracted into a separate RFE. Changes include: > - Renaming > - Extracting code to separate classes/methods > - Adding comments > - Some small refactoring including: > - Removing unused parameters > - Renaming variables/parameters/methods > > Th... This pull request has now been integrated. Changeset: 4bbd972c Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/4bbd972cbb114b99e856aa7904c0240049052b6a Stats: 77 lines in 5 files changed: 17 ins; 7 del; 53 mod 8305638: Renaming and small clean-ups around predicates Reviewed-by: roland, epeter ------------- PR: https://git.openjdk.org/jdk/pull/16877 From eosterlund at openjdk.org Mon May 6 07:54:52 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 6 May 2024 07:54:52 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic [v3] In-Reply-To: <74Np8LGxo8PiyoLAUI7tUlAq7ySVgmGzblZio5Tlhx8=.c0fdf457-bc2f-4f88-a070-325521b469f9@github.com> References: <74Np8LGxo8PiyoLAUI7tUlAq7ySVgmGzblZio5Tlhx8=.c0fdf457-bc2f-4f88-a070-325521b469f9@github.com> Message-ID: On Fri, 3 May 2024 06:47:10 GMT, Roberto Casta?eda Lozano wrote: >> This changeset generalizes the logic to analyze, declare, and communicate which registers are live at a C2 barrier stub so that it can be used by other collectors than ZGC adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). >> >> The main changes are: >> >> - Make it possible to compute register liveness information before (live-in) or after (live-out) each barrier, and let the collector choose by implementing `BarrierSetC2State::needs_livein_data()`. >> >> - Generalize the interface with which collectors declare which registers must be additionally preserved across barrier runtime calls, adding the methods `BarrierStubC2::preserve(Register r)` and `BarrierStubC2::dont_preserve(Register r)`. >> >> - Simplify the interface with which platform-specific logic computes which registers to preserve across barrier runtime calls, replacing the calls to `BarrierStubC2::result()` and `BarrierStubC2::live()` with a single call to `BarrierStubC2::preserve_set()`. >> >> #### Testing >> >> - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> - tier1-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/4d4e743d8f4cddd5288cee1d69c70fe2b9bea066) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. >> - Build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug). @RealFYang, @TheRealMDoerr: could you please test and review the riscv and ppc changes? Thanks! > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Apply code style suggestions from Axel > > Co-authored-by: Axel Boldt-Christmas Looks good! ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19026#pullrequestreview-2040195709 From rcastanedalo at openjdk.org Mon May 6 08:07:57 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 6 May 2024 08:07:57 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic [v3] In-Reply-To: References: <74Np8LGxo8PiyoLAUI7tUlAq7ySVgmGzblZio5Tlhx8=.c0fdf457-bc2f-4f88-a070-325521b469f9@github.com> Message-ID: On Mon, 6 May 2024 07:52:31 GMT, Erik ?sterlund wrote: > Looks good! Thanks for reviewing, Erik! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19026#issuecomment-2095408712 From dfenacci at openjdk.org Mon May 6 08:11:06 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 6 May 2024 08:11:06 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v5] In-Reply-To: References: Message-ID: <2JMRO8HiRuX9_LGSUTOFeg71lygs5EHrw9AkCmC4zsg=.fad94b04-a8cd-464f-8404-005f45deb173@github.com> > # Issue > When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. > > # Causes > On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. > The same is true for `StoreVector`s. > When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 > > where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor offsets interfere. > Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > but we don?t make sure that there are no masks or offsets. > A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and offsets here too but in this case we can include these cases if the masks and offsets of the vector stores are equivalent. > > # Solution > To avoid folding `Load`- and `StoreVector`s with masks and offsets we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns its own (to avoid ever being the same as a store node). In this way, the checks in `MemNode::can_see_stored_value` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 > > and `StoreNode::Identity` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > will fail if masks or offsets are used. > For 2 stores of the same value we instead check for mask and offset equality. > > Regression tests for all versions of `Load/StoreVectorGather/Masked` have been add... Damon Fenacci has updated the pull request incrementally with three additional commits since the last revision: - JDK-8325520: check for same vector type - JDK-8325520: remove useless checks for second store type - JDK-8325520: use -1 as unknown opcode in store_Opcode ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18347/files - new: https://git.openjdk.org/jdk/pull/18347/files/d25bcacf..524ff888 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=03-04 Stats: 43 lines in 2 files changed: 14 ins; 3 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/18347.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18347/head:pull/18347 PR: https://git.openjdk.org/jdk/pull/18347 From dfenacci at openjdk.org Mon May 6 08:16:57 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 6 May 2024 08:16:57 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v4] In-Reply-To: References: Message-ID: On Thu, 25 Apr 2024 12:39:32 GMT, Emanuel Peter wrote: > // Load a float vector from the memory segment (internally, it does checkIndex and unsafe load from the byte array) > FloatVector floatVector = FloatVector.fromMemorySegment(ms, offset, ByteOrder.nativeOrder()) > ``` > > I did not test this, but I think something like this should work. Right! I totally missed that `from`- `intoMmemorySegment` methods! I guess that?s probably one reason why vector nodes are not typed. I?ve added checks to `StoreNode::Identity` as well. > src/hotspot/share/opto/memnode.cpp line 3533: > >> 3531: const Node* offsets = stv->in(StoreVectorScatterMaskedNode::Offsets); >> 3532: const Node* mask = stv->in(StoreVectorScatterMaskedNode::Mask); >> 3533: if (mem->is_StoreVectorScatterMasked()) { > > This `if` will always be true, since we already check `mem->Opcode() == Opcode()`. The code would be simpler if you extracted the offsets and masks in parallel. Yep, I removed this useless `if` and 2 other terms in the ifs before that. I'm just not sure of what you mean with > if you extracted the offsets and masks in parallel. > src/hotspot/share/opto/vectornode.hpp line 916: > >> 914: virtual int store_Opcode() const { >> 915: // Ensure it is different from any store opcode >> 916: return Op_LoadVectorGather; > > I think you should take `-1`, which is what `MemNode::store_Opcode()` returns. It means "unknown". OK. I was just a bit concerned that a check for equality between 2 `store_Opcode` could be true because they are both -1 but this shouldn?t happen. Changed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18347#issuecomment-2095424211 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1590680121 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1590679021 From epeter at openjdk.org Mon May 6 08:28:03 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 6 May 2024 08:28:03 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v9] In-Reply-To: References: <5H6XV7Agl6ZNfGWT-bCbIPsimFTYM0pyIGiAHDQUUyA=.168e21cc-6cd8-42d8-ab59-d5e02e241ea2@github.com> <0RKnLUgc6UBtyxSyezCMWsSbP50hu6fQ6UJPHpGlgSU=.9fafa10f-62ee-4ec8-9093-4e204fcbe504@github.com> <5QbsVmYi0tYGlOvDL4LjJb1SjChIZtaWSMthFM9grMI=.0900e1c3-90b3-4726-a7c6-c2aff49d07ce@github.com> Message-ID: On Thu, 2 May 2024 18:44:25 GMT, Roland Westrelin wrote: >> I'm waiting for @rwestrel to respond to my last list of comments/questions. > > @eme64 change is ready for another review @rwestrel I feel like I am heavily stepping on your toes now.... Can you please do refactoring in a separate prior PR? This change is now 3K+ lines, and even reading through it all takes me more than a day, I simply cannot commit this many hours at a time. I'm thinking in particular about your most recent changes with: - `class Invariance` - `estimate_if_peeling_possible` Don't get me wrong: I like those refactorings, but they should be done separately. If you can find anything else that could be done separately, that would help greatly. I have been painstakingly separating my SuperWord PR's into more reviewable patches, and I do get quicker reviews that way. My concern: I think the code is now in a state that can be understood (if one spends a day reading it all), but it is hard for me to say that it is correct. If I now approve this patch, then a subsequent reviewer will pay less attention, hence, I feel like I cannot just approve it too quickly now. If I am too annoying, feel free to ask someone else to review and I will just step back. Maybe @theRealAph wants to review for a while? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2095441298 From epeter at openjdk.org Mon May 6 09:11:04 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 6 May 2024 09:11:04 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v9] In-Reply-To: References: <5H6XV7Agl6ZNfGWT-bCbIPsimFTYM0pyIGiAHDQUUyA=.168e21cc-6cd8-42d8-ab59-d5e02e241ea2@github.com> <0RKnLUgc6UBtyxSyezCMWsSbP50hu6fQ6UJPHpGlgSU=.9fafa10f-62ee-4ec8-9093-4e204fcbe504@github.com> <5QbsVmYi0tYGlOvDL4LjJb1SjChIZtaWSMthFM9grMI=.0900e1c3-90b3-4726-a7c6-c2aff49d07ce@github.com> Message-ID: On Thu, 2 May 2024 18:44:25 GMT, Roland Westrelin wrote: >> I'm waiting for @rwestrel to respond to my last list of comments/questions. > > @eme64 change is ready for another review @rwestrel one idea to split things here: - Early inline of ScopedValue methods - Parsing to IR nodes and expansion back. - Optimization - Tests This way, I can spend only a few hours on one at a time, and we can get this done. Of course, you cannot really integrate an individual one, maybe there is a way to use skara for dependent PR's? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2095515126 From rcastanedalo at openjdk.org Mon May 6 09:29:58 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 6 May 2024 09:29:58 GMT Subject: Integrated: 8331418: ZGC: generalize barrier liveness logic In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 18:43:03 GMT, Roberto Casta?eda Lozano wrote: > This changeset generalizes the logic to analyze, declare, and communicate which registers are live at a C2 barrier stub so that it can be used by other collectors than ZGC adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). > > The main changes are: > > - Make it possible to compute register liveness information before (live-in) or after (live-out) each barrier, and let the collector choose by implementing `BarrierSetC2State::needs_livein_data()`. > > - Generalize the interface with which collectors declare which registers must be additionally preserved across barrier runtime calls, adding the methods `BarrierStubC2::preserve(Register r)` and `BarrierStubC2::dont_preserve(Register r)`. > > - Simplify the interface with which platform-specific logic computes which registers to preserve across barrier runtime calls, replacing the calls to `BarrierStubC2::result()` and `BarrierStubC2::live()` with a single call to `BarrierStubC2::preserve_set()`. > > #### Testing > > - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). > - tier1-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/4d4e743d8f4cddd5288cee1d69c70fe2b9bea066) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. > - Build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug). @RealFYang, @TheRealMDoerr: could you please test and review the riscv and ppc changes? Thanks! This pull request has now been integrated. Changeset: 6c776411 Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/6c7764118ef1a684edddb302a4eaff36d80c783f Stats: 112 lines in 9 files changed: 60 ins; 37 del; 15 mod 8331418: ZGC: generalize barrier liveness logic Reviewed-by: mdoerr, aboldtch, fyang, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/19026 From rcastanedalo at openjdk.org Mon May 6 09:44:53 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 6 May 2024 09:44:53 GMT Subject: RFR: 8330016: Stress seed should be initialized for runtime stub compilation In-Reply-To: <-Pj7vK6Wpgv3FnP2KZXB2so2wWraG3LWEKds9XDI3pY=.d6595586-aa83-47bc-84ed-ff8c4a1a5550@github.com> References: <-Pj7vK6Wpgv3FnP2KZXB2so2wWraG3LWEKds9XDI3pY=.d6595586-aa83-47bc-84ed-ff8c4a1a5550@github.com> Message-ID: On Mon, 6 May 2024 06:31:47 GMT, Daniel Skantz wrote: > We can initialize the stress seed for runtime stub compilation as we already do for method compilation. This found the bug described in JDK-8329258. It would apply if StressGCM or StressLCM vm flags are set. > > Testing: T1-5 default options. T1-5 with -XX:+StressLCM and -XX:+StressGCM. Manually tested that the stress seed is set and printed to compilation log if either stress option is set. src/hotspot/share/opto/compile.cpp line 5066: > 5064: // Auxiliary methods to support randomized stressing/fuzzing. > 5065: > 5066: void Compile::initialize_stress_seed(DirectiveSet* directive) { Suggestion: void Compile::initialize_stress_seed(const DirectiveSet* directive) { src/hotspot/share/opto/compile.hpp line 1282: > 1280: > 1281: // seed random number generation and log the seed for repeatability. > 1282: void initialize_stress_seed(DirectiveSet* directive); Suggestion: void initialize_stress_seed(const DirectiveSet* directive); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19095#discussion_r1590770840 PR Review Comment: https://git.openjdk.org/jdk/pull/19095#discussion_r1590771133 From rcastanedalo at openjdk.org Mon May 6 09:50:53 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 6 May 2024 09:50:53 GMT Subject: RFR: 8330016: Stress seed should be initialized for runtime stub compilation In-Reply-To: <-Pj7vK6Wpgv3FnP2KZXB2so2wWraG3LWEKds9XDI3pY=.d6595586-aa83-47bc-84ed-ff8c4a1a5550@github.com> References: <-Pj7vK6Wpgv3FnP2KZXB2so2wWraG3LWEKds9XDI3pY=.d6595586-aa83-47bc-84ed-ff8c4a1a5550@github.com> Message-ID: On Mon, 6 May 2024 06:31:47 GMT, Daniel Skantz wrote: > We can initialize the stress seed for runtime stub compilation as we already do for method compilation. This found the bug described in JDK-8329258. It would apply if StressGCM or StressLCM vm flags are set. > > Testing: T1-5 default options. T1-5 with -XX:+StressLCM and -XX:+StressGCM. Manually tested that the stress seed is set and printed to compilation log if either stress option is set. Looks good otherwise! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19095#pullrequestreview-2040390290 From chagedorn at openjdk.org Mon May 6 10:40:52 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 6 May 2024 10:40:52 GMT Subject: RFR: 8330016: Stress seed should be initialized for runtime stub compilation In-Reply-To: <-Pj7vK6Wpgv3FnP2KZXB2so2wWraG3LWEKds9XDI3pY=.d6595586-aa83-47bc-84ed-ff8c4a1a5550@github.com> References: <-Pj7vK6Wpgv3FnP2KZXB2so2wWraG3LWEKds9XDI3pY=.d6595586-aa83-47bc-84ed-ff8c4a1a5550@github.com> Message-ID: On Mon, 6 May 2024 06:31:47 GMT, Daniel Skantz wrote: > We can initialize the stress seed for runtime stub compilation as we already do for method compilation. This found the bug described in JDK-8329258. It would apply if StressGCM or StressLCM vm flags are set. > > Testing: T1-5 default options. T1-5 with -XX:+StressLCM and -XX:+StressGCM. Manually tested that the stress seed is set and printed to compilation log if either stress option is set. Looks good to me, too. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19095#pullrequestreview-2040466871 From roland at openjdk.org Mon May 6 10:52:53 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 6 May 2024 10:52:53 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: Message-ID: On Fri, 3 May 2024 10:18:44 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/compile.cpp line 3906: >> >>> 3904: for (DUIterator_Fast imax, i = m->fast_outs(imax); i < imax; i++) { >>> 3905: Node* use = m->fast_out(i); >>> 3906: if (use->is_Mem() || use->Opcode() == Op_DivI || use->Opcode() == Op_DivL) { >> >> `Op_ModI` and `Op_ModL` are missing here. And isn't this too strong in cases where we can prove that the operand is non-zero? Could you re-use `PhaseIterGVN::no_dependent_zero_check`? Please also add corresponding tests. >> >> Looking at `PhaseIterGVN::no_dependent_zero_check`, I noticed that `UDiv[I/L]Node` and `UMod[I/L]Node` are not handled but I think they should. I think this was missed when these nodes where added by [JDK-8282221](https://bugs.openjdk.org/browse/JDK-8282221). One can probably extend @chhagedorn's test from [JDK-8259227](https://bugs.openjdk.org/browse/JDK-8259227) to trigger the same issue. > >> `Op_ModI` and `Op_ModL` are missing here. > > Good catch! I added test cases for `Op_ModI` and `Op_ModL` , the unsigned variants and the also the DivMod variants. I also fixed the patch so it handles all of them. > >> And isn't this too strong in cases where we can prove that the operand is non-zero? > > I don't think it's too strong. The operand can be non zero because of a range check `CastII` somewhere along the subgraph that starts at the node's second input. In that case, `PhaseIterGVN::no_dependent_zero_check` would return true but removing the range `CastII` would cause the bugs that are triggered by the test case. > >> Looking at `PhaseIterGVN::no_dependent_zero_check`, I noticed that `UDiv[I/L]Node` and `UMod[I/L]Node` are not handled but I think they should. I think this was missed when these nodes where added by [JDK-8282221](https://bugs.openjdk.org/browse/JDK-8282221). One can probably extend @chhagedorn's test from [JDK-8259227](https://bugs.openjdk.org/browse/JDK-8259227) to trigger the same issue. > > That seems like a different problem that out of the scope of this particular issue. I realized that I didn't understand your comment when I replied. What you're saying, I think, is that if we have, say, a `CastII` that's input to a `DivI` node, if the input to that cast is non zero, then we don't need to add the `CastII` control as dependency to the `DivI`. The problem, I think, is that the `CastII` could be input to say an `AddI` node which would then be input to the `DivI`. What we would then need to know is whether if we remove the `CastII`, the `AddI` is still non null or not. That doesn't seem straightforward because this is done once we have no igvn instance to propagate types anymore. So, while I agree this is conservative, it still seems like the most reasonable fix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18377#discussion_r1590836071 From roland at openjdk.org Mon May 6 10:52:57 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 6 May 2024 10:52:57 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 12:42:12 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - review >> - Merge branch 'master' into JDK-8324517 >> - test and fix > > test/hotspot/jtreg/compiler/rangechecks/TestArrayAccessAboveRCAfterRCCastIIEliminated.java line 37: > >> 35: * @run main/othervm -XX:-TieredCompilation -XX:-UseOnStackReplacement -XX:-BackgroundCompilation >> 36: * -XX:CompileCommand=dontinline,TestArrayAccessAboveRCAfterRCCastIIEliminated::notInlined >> 37: * -XX:+StressIGVN -XX:StressSeed=94546681 TestArrayAccessAboveRCAfterRCCastIIEliminated > > `Error: VM option 'StressIGVN' is diagnostic and must be enabled via -XX:+UnlockDiagnosticVMOptions.` Fixed in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18377#discussion_r1590836230 From redestad at openjdk.org Mon May 6 11:18:55 2024 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 6 May 2024 11:18:55 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v4] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 11:08:16 GMT, Adam Sotona wrote: >> Hi, >> During performance optimization work on Class-File API as JDK lambda generator we found some static initialization killers. >> One of them is `java.lang.classfile.Attributes` with tens of static fields initialized with individual attribute mappers, and common set of all mappers, and static map from attribute names to the mappers. >> >> I propose to turn all the static fields into lazy-initialized static methods and remove `PREDEFINED_ATTRIBUTES` and `standardAttribute(Utf8Entry name)` static mapping method from the `Attributes` API class. >> >> Please let me know your comments or objections and please review the [PR](https://github.com/openjdk/jdk/pull/19006) and [CSR](https://bugs.openjdk.org/browse/JDK-8331414), so we can make it into 23. >> >> Thank you, >> Adam > > Adam Sotona has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into JDK-8331291-attributes > - changed order in allowed modules attributes check > - added bug number > - added impl comment > - removed list of predefined attributes > standard attributes mapping hard-coded and moved to BoundAttribute > added AttributesTest::testAttributesMapping > - move mappers implementations to AbstractAttributeMapper > - 8331291: java.lang.classfile.Attributes class performs a lot of static initializations FWIW code changes looks good to me. There seems to be a number of tests that still need to be updated to use the new methods instead of the old constants. ------------- Marked as reviewed by redestad (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19006#pullrequestreview-2040558054 From roland at openjdk.org Mon May 6 11:52:53 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 6 May 2024 11:52:53 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop In-Reply-To: References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> Message-ID: On Mon, 6 May 2024 07:31:22 GMT, Christian Hagedorn wrote: > But I'm still wondering though, if these bailouts are really needed in the general case. It seems like this problem is mainly for loop phis. Couldn't we check the types of loop phi inputs and bail out if one includes zero? Are we sure divisions are the only cause of bugs? My understanding of this issue is that once pushed thru phi, the type of the `ConvL2I` is simply not correct and that's the root cause. I wonder if we could get other failures because of this: maybe a node becoming top because of the incorrect type or an out of bound array access. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19086#discussion_r1590910489 From roland at openjdk.org Mon May 6 12:00:00 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 6 May 2024 12:00:00 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v9] In-Reply-To: References: <5H6XV7Agl6ZNfGWT-bCbIPsimFTYM0pyIGiAHDQUUyA=.168e21cc-6cd8-42d8-ab59-d5e02e241ea2@github.com> <0RKnLUgc6UBtyxSyezCMWsSbP50hu6fQ6UJPHpGlgSU=.9fafa10f-62ee-4ec8-9093-4e204fcbe504@github.com> <5QbsVmYi0tYGlOvDL4LjJb1SjChIZtaWSMthFM9grMI=.0900e1c3-90b3-4726-a7c6-c2aff49d07ce@github.com> Message-ID: On Mon, 6 May 2024 08:25:10 GMT, Emanuel Peter wrote: > I'm thinking in particular about your most recent changes with: > > * `class Invariance` > > * `estimate_if_peeling_possible` > > > Don't get me wrong: I like those refactorings, but they should be done separately. The problem I see is that they have little value unless this patch is integrated as it is. What if another reviewer thinks it's better to keep everything related to loop predication together? There's no need to change the class `Invariance` then. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2095842139 From roland at openjdk.org Mon May 6 12:06:02 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 6 May 2024 12:06:02 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v9] In-Reply-To: References: <5H6XV7Agl6ZNfGWT-bCbIPsimFTYM0pyIGiAHDQUUyA=.168e21cc-6cd8-42d8-ab59-d5e02e241ea2@github.com> <0RKnLUgc6UBtyxSyezCMWsSbP50hu6fQ6UJPHpGlgSU=.9fafa10f-62ee-4ec8-9093-4e204fcbe504@github.com> <5QbsVmYi0tYGlOvDL4LjJb1SjChIZtaWSMthFM9grMI=.0900e1c3-90b3-4726-a7c6-c2aff49d07ce@github.com> Message-ID: On Mon, 6 May 2024 11:56:52 GMT, Roland Westrelin wrote: >> @rwestrel I feel like I am heavily stepping on your toes now.... >> Can you please do refactoring in a separate prior PR? This change is now 3K+ lines, and even reading through it all takes me more than a day, I simply cannot commit this many hours at a time. >> >> I'm thinking in particular about your most recent changes with: >> - `class Invariance` >> - `estimate_if_peeling_possible` >> >> Don't get me wrong: I like those refactorings, but they should be done separately. >> >> If you can find anything else that could be done separately, that would help greatly. >> >> I have been painstakingly separating my SuperWord PR's into more reviewable patches, and I do get quicker reviews that way. >> >> My concern: I think the code is now in a state that can be understood (if one spends a day reading it all), but it is hard for me to say that it is correct. If I now approve this patch, then a subsequent reviewer will pay less attention, hence, I feel like I cannot just approve it too quickly now. >> >> If I am too annoying, feel free to ask someone else to review and I will just step back. Maybe @theRealAph wants to review for a while? > >> I'm thinking in particular about your most recent changes with: >> >> * `class Invariance` >> >> * `estimate_if_peeling_possible` >> >> >> Don't get me wrong: I like those refactorings, but they should be done separately. > > The problem I see is that they have little value unless this patch is integrated as it is. What if another reviewer thinks it's better to keep everything related to loop predication together? There's no need to change the class `Invariance` then. > @rwestrel one idea to split things here: > > * Early inline of ScopedValue methods > > * Parsing to IR nodes and expansion back. > > * Optimization > > * Tests > > > This way, I can spend only a few hours on one at a time, and we can get this done. Of course, you cannot really integrate an individual one, maybe there is a way to use skara for dependent PR's? Would one commit per line above work? Or do you think it needs to be different PRs? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2095855714 From epeter at openjdk.org Mon May 6 12:21:01 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 6 May 2024 12:21:01 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v9] In-Reply-To: References: <5H6XV7Agl6ZNfGWT-bCbIPsimFTYM0pyIGiAHDQUUyA=.168e21cc-6cd8-42d8-ab59-d5e02e241ea2@github.com> <0RKnLUgc6UBtyxSyezCMWsSbP50hu6fQ6UJPHpGlgSU=.9fafa10f-62ee-4ec8-9093-4e204fcbe504@github.com> <5QbsVmYi0tYGlOvDL4LjJb1SjChIZtaWSMthFM9grMI=.0900e1c3-90b3-4726-a7c6-c2aff49d07ce@github.com> Message-ID: <99M7Sb0E8z_DOcJ54d5LiJbIeWX0AQj7Ypmg_TNsQZ0=.0b35ec61-ba5a-456f-b597-d43a71ad8095@github.com> On Mon, 6 May 2024 12:03:12 GMT, Roland Westrelin wrote: >>> I'm thinking in particular about your most recent changes with: >>> >>> * `class Invariance` >>> >>> * `estimate_if_peeling_possible` >>> >>> >>> Don't get me wrong: I like those refactorings, but they should be done separately. >> >> The problem I see is that they have little value unless this patch is integrated as it is. What if another reviewer thinks it's better to keep everything related to loop predication together? There's no need to change the class `Invariance` then. > >> @rwestrel one idea to split things here: >> >> * Early inline of ScopedValue methods >> >> * Parsing to IR nodes and expansion back. >> >> * Optimization >> >> * Tests >> >> >> This way, I can spend only a few hours on one at a time, and we can get this done. Of course, you cannot really integrate an individual one, maybe there is a way to use skara for dependent PR's? > > Would one commit per line above work? Or do you think it needs to be different PRs? @rwestrel Just using commits is probably not really helpful. What would you do if there needs to be an update to commit 1, requested by a reviewer? Honestly, I would like to take a break from this for now. I leave it up to you how to present it in a way that is easier to review. Once you get someone to review and accept it, I can see if I find time to review again. I think the code is significantly better/readable than when we first started. So if someone like @vnkozlov simply scans and approves it, and as such takes the responsibility of "first reviewer", then I'm totally fine with that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2095886779 From vlivanov at openjdk.org Mon May 6 12:23:58 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 6 May 2024 12:23:58 GMT Subject: RFR: 8322726: C2: Unloaded signature class kills argument value In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 11:35:25 GMT, Vladimir Ivanov wrote: > For MethodHandle linkers all arguments are casted to signature classes when target method is known. > > It causes problems when target method signature contains unloaded classes: when loaded class meets unloaded class it turns into a TOP. It effectively kills argument values which correspond to unloaded signature types. > > Proposed fix avoids casts when signature class is unloaded. > > Testing: hs-tier1 - hs-tier4 Thanks for the reviews, Vladimir K, Dean, and Tobias. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18973#issuecomment-2095890793 From vlivanov at openjdk.org Mon May 6 12:23:59 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 6 May 2024 12:23:59 GMT Subject: Integrated: 8322726: C2: Unloaded signature class kills argument value In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 11:35:25 GMT, Vladimir Ivanov wrote: > For MethodHandle linkers all arguments are casted to signature classes when target method is known. > > It causes problems when target method signature contains unloaded classes: when loaded class meets unloaded class it turns into a TOP. It effectively kills argument values which correspond to unloaded signature types. > > Proposed fix avoids casts when signature class is unloaded. > > Testing: hs-tier1 - hs-tier4 This pull request has now been integrated. Changeset: fa02667d Author: Vladimir Ivanov URL: https://git.openjdk.org/jdk/commit/fa02667d838f08cac7d41dfb4c3e8056ae6165cc Stats: 181 lines in 5 files changed: 168 ins; 0 del; 13 mod 8322726: C2: Unloaded signature class kills argument value Reviewed-by: kvn, dlong, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/18973 From tholenstein at openjdk.org Mon May 6 13:15:16 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 6 May 2024 13:15:16 GMT Subject: RFR: 8330584: IGV: XML does not save all node properties Message-ID: When C2 sends graphs over the network to IGV, each graph is sent separately. The same applies if C2 saves graphs to XML: each graph is saved with all it's nodes as a separate `...` in the XML To save space, graphs that are saved from IGV only contains the incremental difference for each graph. This saves a lot of space (~5-10x). The logic happens in Printer.java -> `exportInputGraph(.., difference=true, ...)` Unfortunately, there is a bug in this logic: the properties of the nodes are not saved correctly. [graphs.zip](https://github.com/openjdk/jdk/files/15220940/graphs.zip) contains 4 graphs: `graph_c2.xml` (230KB) - a XML saved from C2 `graph_igv_bug.xml` (73KB) - opened `graph_c2.xml` in IGV (without this fix) and save as `graph_igv_bug.xml`. `graph_igv_fixed.xml` (123KB) - opened `graph_c2.xml` in IGV (with this fix) and save as `graph_igv_fixed.xml `. As you can see `graph_igv_fixed.xml` is twice as large as `graph_igv_bug.xml` because it contains the missing properties. But now the memory saving from the original `graph_c2.xml` is only ~2x. Therefore a new format for saving is added: graphs can now be saved and opened from IGV as `.igv`. This uses a compressed (ZIP) format. `graph.igv` (10KB) is the same graph as `graph_c2.xml` (230KB). But it uses difference graph compression and ZIP compression and is in total 23x smaller in memory footprint. E.g. The root in the last graph of difference_true.xml has way less properties than in difference_false.xml. ------------- Commit messages: - Update InputNode.java - compressed graphs as .igv files - JDK-8330584 IGV: XML does not save all node properties Changes: https://git.openjdk.org/jdk/pull/19104/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19104&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330584 Stats: 147 lines in 3 files changed: 79 ins; 16 del; 52 mod Patch: https://git.openjdk.org/jdk/pull/19104.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19104/head:pull/19104 PR: https://git.openjdk.org/jdk/pull/19104 From asotona at openjdk.org Mon May 6 13:59:19 2024 From: asotona at openjdk.org (Adam Sotona) Date: Mon, 6 May 2024 13:59:19 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v5] In-Reply-To: References: Message-ID: > Hi, > During performance optimization work on Class-File API as JDK lambda generator we found some static initialization killers. > One of them is `java.lang.classfile.Attributes` with tens of static fields initialized with individual attribute mappers, and common set of all mappers, and static map from attribute names to the mappers. > > I propose to turn all the static fields into lazy-initialized static methods and remove `PREDEFINED_ATTRIBUTES` and `standardAttribute(Utf8Entry name)` static mapping method from the `Attributes` API class. > > Please let me know your comments or objections and please review the [PR](https://github.com/openjdk/jdk/pull/19006) and [CSR](https://bugs.openjdk.org/browse/JDK-8331414), so we can make it into 23. > > Thank you, > Adam Adam Sotona has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: - updated LimitsTest - Merge branch 'master' into JDK-8331291-attributes # Conflicts: # test/jdk/jdk/classfile/SignaturesTest.java - Merge branch 'master' into JDK-8331291-attributes - changed order in allowed modules attributes check - added bug number - added impl comment - removed list of predefined attributes standard attributes mapping hard-coded and moved to BoundAttribute added AttributesTest::testAttributesMapping - move mappers implementations to AbstractAttributeMapper - 8331291: java.lang.classfile.Attributes class performs a lot of static initializations ------------- Changes: https://git.openjdk.org/jdk/pull/19006/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19006&range=04 Stats: 2032 lines in 48 files changed: 905 ins; 619 del; 508 mod Patch: https://git.openjdk.org/jdk/pull/19006.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19006/head:pull/19006 PR: https://git.openjdk.org/jdk/pull/19006 From chagedorn at openjdk.org Mon May 6 14:24:27 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 6 May 2024 14:24:27 GMT Subject: RFR: 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode [v2] In-Reply-To: References: Message-ID: > This patch replaces the `Opaque4Node` of the `If` for Initialized Assertion Predicates with a new `OpaqueInitializedAsseritonPredicateNode`. This helps to simplify pattern matching for predicate code and to distinguish from the two other uses of `Opaque4` nodes: > 1. Template Assertion Predicate: The goal is to get rid of its `Opaque4Node` as well by using a dedicated `TemplateAssertionPredicateNode` for the `IfNode`. > 2. Non-null-checks with instrinsics and unsafe accesses: This will eventually be the only use left. Once we get there, we should rename the node accordingly to `OpaqueNonNullCheck` or something like that. > > I went through all the uses of `Opaque4` nodes and did the following: > - Could the `Opaque4` node be part of an Initialized Assertion Predicate? > - No: Added an assert that we are not dealing with an Initialized Assertion Predicate. > - Yes: > - Yes **and only** for Initialized Assertion Predicates? Added an assert that we are only expecting an `OpaqueInitializedAsseritonPredicateNode` if appropriate. > - Yes but could also be something else: Added case for `OpaqueInitializedAsseritonPredicateNode` next to the `Opaque4` case. > - Is this `Opaque4` node only used for Template Assertion Predicates? > - Yes: Added assert with call to `assertion_predicate_has_loop_opaque_node()` to check that we find its `OpaqueLoop*Nodes`. > - I've added test cases where I was not sure about whether an `Opaque4` node could be part of a Template, an Initialized Assertion Predicate or a non-null-check. This was a little tricky but I think it was still worth to prevent future bugs (even though most of these special cases are quite rare). > > This is another patch split off from the full fix for Assertion Predicates. > > Thanks, > Christian Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into JDK-8330386 - Add more comments and asserts - Add more tests - 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18951/files - new: https://git.openjdk.org/jdk/pull/18951/files/089a4e65..fe3feb8b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18951&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18951&range=00-01 Stats: 22254 lines in 1611 files changed: 8283 ins; 8470 del; 5501 mod Patch: https://git.openjdk.org/jdk/pull/18951.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18951/head:pull/18951 PR: https://git.openjdk.org/jdk/pull/18951 From dfenacci at openjdk.org Mon May 6 15:20:32 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 6 May 2024 15:20:32 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v6] In-Reply-To: References: Message-ID: > # Issue > When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. > > # Causes > On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. > The same is true for `StoreVector`s. > When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 > > where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor offsets interfere. > Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > but we don?t make sure that there are no masks or offsets. > A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and offsets here too but in this case we can include these cases if the masks and offsets of the vector stores are equivalent. > > # Solution > To avoid folding `Load`- and `StoreVector`s with masks and offsets we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns its own (to avoid ever being the same as a store node). In this way, the checks in `MemNode::can_see_stored_value` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 > > and `StoreNode::Identity` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > will fail if masks or offsets are used. > For 2 stores of the same value we instead check for mask and offset equality. > > Regression tests for all versions of `Load/StoreVectorGather/Masked` have been add... Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: - JDK-8325520: add store/load masked vector tests - JDK-8325520: add store/load tests with duplicate offsets ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18347/files - new: https://git.openjdk.org/jdk/pull/18347/files/524ff888..85bb4bef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=04-05 Stats: 95 lines in 1 file changed: 94 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18347.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18347/head:pull/18347 PR: https://git.openjdk.org/jdk/pull/18347 From dfenacci at openjdk.org Mon May 6 15:23:00 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 6 May 2024 15:23:00 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v4] In-Reply-To: References: Message-ID: On Thu, 25 Apr 2024 12:21:43 GMT, Emanuel Peter wrote: > It would be great if you had tests that exactly exercise these "bad" examples, where it looks like we might optimize, but it would be wrong. Yep, good idea. I've added a few tests to check for those cases (load-store with duplicate offsets and store-load with masks). Thanks @eme64! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18347#issuecomment-2096287127 From asotona at openjdk.org Mon May 6 15:59:08 2024 From: asotona at openjdk.org (Adam Sotona) Date: Mon, 6 May 2024 15:59:08 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v6] In-Reply-To: References: Message-ID: <8b638nkCvzhpf1xUCK-KGXVXqeYPwzFkVOJPOFDtyd4=.50d86a2b-a695-49d5-8de6-924b41f507f5@github.com> > Hi, > During performance optimization work on Class-File API as JDK lambda generator we found some static initialization killers. > One of them is `java.lang.classfile.Attributes` with tens of static fields initialized with individual attribute mappers, and common set of all mappers, and static map from attribute names to the mappers. > > I propose to turn all the static fields into lazy-initialized static methods and remove `PREDEFINED_ATTRIBUTES` and `standardAttribute(Utf8Entry name)` static mapping method from the `Attributes` API class. > > Please let me know your comments or objections and please review the [PR](https://github.com/openjdk/jdk/pull/19006) and [CSR](https://bugs.openjdk.org/browse/JDK-8331414), so we can make it into 23. > > Thank you, > Adam Adam Sotona has updated the pull request incrementally with one additional commit since the last revision: fixed tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19006/files - new: https://git.openjdk.org/jdk/pull/19006/files/497dd533..a1a55d71 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19006&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19006&range=04-05 Stats: 180 lines in 94 files changed: 0 ins; 0 del; 180 mod Patch: https://git.openjdk.org/jdk/pull/19006.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19006/head:pull/19006 PR: https://git.openjdk.org/jdk/pull/19006 From asotona at openjdk.org Mon May 6 15:59:08 2024 From: asotona at openjdk.org (Adam Sotona) Date: Mon, 6 May 2024 15:59:08 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v4] In-Reply-To: References: Message-ID: <5agtRoM-ozF1_jEnCOI4j9tvcEJEhul2FSDxHX8hEAE=.d2c1fe74-e84f-4007-9d39-57901b1788e2@github.com> On Thu, 2 May 2024 14:40:16 GMT, Chen Liang wrote: > On a side note, will we update JEP 466 to include this patch? I hope so, if we get it into 23 ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19006#issuecomment-2096378934 From asotona at openjdk.org Mon May 6 15:59:08 2024 From: asotona at openjdk.org (Adam Sotona) Date: Mon, 6 May 2024 15:59:08 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v4] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 11:16:16 GMT, Claes Redestad wrote: > FWIW code changes looks good to me. There seems to be a number of tests that still need to be updated to use the new methods instead of the old constants. Thank you! Yes, I'm cleaning the tests right now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19006#issuecomment-2096380853 From asotona at openjdk.org Mon May 6 16:07:26 2024 From: asotona at openjdk.org (Adam Sotona) Date: Mon, 6 May 2024 16:07:26 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v7] In-Reply-To: References: Message-ID: <_5Ike3ZDfok-lU5AItq7mDu80Gme4vvRrmvpovOOXHg=.763c4a63-7dff-49f5-b826-93d727e9f5b9@github.com> > Hi, > During performance optimization work on Class-File API as JDK lambda generator we found some static initialization killers. > One of them is `java.lang.classfile.Attributes` with tens of static fields initialized with individual attribute mappers, and common set of all mappers, and static map from attribute names to the mappers. > > I propose to turn all the static fields into lazy-initialized static methods and remove `PREDEFINED_ATTRIBUTES` and `standardAttribute(Utf8Entry name)` static mapping method from the `Attributes` API class. > > Please let me know your comments or objections and please review the [PR](https://github.com/openjdk/jdk/pull/19006) and [CSR](https://bugs.openjdk.org/browse/JDK-8331414), so we can make it into 23. > > Thank you, > Adam Adam Sotona has updated the pull request incrementally with one additional commit since the last revision: fixed tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19006/files - new: https://git.openjdk.org/jdk/pull/19006/files/a1a55d71..dcbaae85 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19006&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19006&range=05-06 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19006.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19006/head:pull/19006 PR: https://git.openjdk.org/jdk/pull/19006 From kvn at openjdk.org Mon May 6 17:02:58 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 6 May 2024 17:02:58 GMT Subject: RFR: 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode [v2] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 14:24:27 GMT, Christian Hagedorn wrote: >> This patch replaces the `Opaque4Node` of the `If` for Initialized Assertion Predicates with a new `OpaqueInitializedAsseritonPredicateNode`. This helps to simplify pattern matching for predicate code and to distinguish from the two other uses of `Opaque4` nodes: >> 1. Template Assertion Predicate: The goal is to get rid of its `Opaque4Node` as well by using a dedicated `TemplateAssertionPredicateNode` for the `IfNode`. >> 2. Non-null-checks with instrinsics and unsafe accesses: This will eventually be the only use left. Once we get there, we should rename the node accordingly to `OpaqueNonNullCheck` or something like that. >> >> I went through all the uses of `Opaque4` nodes and did the following: >> - Could the `Opaque4` node be part of an Initialized Assertion Predicate? >> - No: Added an assert that we are not dealing with an Initialized Assertion Predicate. >> - Yes: >> - Yes **and only** for Initialized Assertion Predicates? Added an assert that we are only expecting an `OpaqueInitializedAsseritonPredicateNode` if appropriate. >> - Yes but could also be something else: Added case for `OpaqueInitializedAsseritonPredicateNode` next to the `Opaque4` case. >> - Is this `Opaque4` node only used for Template Assertion Predicates? >> - Yes: Added assert with call to `assertion_predicate_has_loop_opaque_node()` to check that we find its `OpaqueLoop*Nodes`. >> - I've added test cases where I was not sure about whether an `Opaque4` node could be part of a Template, an Initialized Assertion Predicate or a non-null-check. This was a little tricky but I think it was still worth to prevent future bugs (even though most of these special cases are quite rare). >> >> This is another patch split off from the full fix for Assertion Predicates. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into JDK-8330386 > - Add more comments and asserts > - Add more tests > - 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode Looks reasonable. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18951#pullrequestreview-2041258643 From kvn at openjdk.org Mon May 6 17:17:52 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 6 May 2024 17:17:52 GMT Subject: RFR: 8329273: C2 SuperWord: Some basic MemorySegment IR tests In-Reply-To: References: Message-ID: On Thu, 28 Mar 2024 16:34:38 GMT, Emanuel Peter wrote: > I could not find any IR vectorization tests for `MemorySegment` yet. > > I make sure to exercise different backing types: > - arrays > - buffers > - native memory > > I filed a follow-up RFE, to eventually make all cases where I have "FAILS" vectorize: > > [JDK-8331659](https://bugs.openjdk.org/browse/JDK-8331659): C2 SuperWord: investicate failed vectorization in compiler/loopopts/superword/TestMemorySegment.java Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18535#pullrequestreview-2041284796 From asotona at openjdk.org Mon May 6 18:24:25 2024 From: asotona at openjdk.org (Adam Sotona) Date: Mon, 6 May 2024 18:24:25 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v8] In-Reply-To: References: Message-ID: > Hi, > During performance optimization work on Class-File API as JDK lambda generator we found some static initialization killers. > One of them is `java.lang.classfile.Attributes` with tens of static fields initialized with individual attribute mappers, and common set of all mappers, and static map from attribute names to the mappers. > > I propose to turn all the static fields into lazy-initialized static methods and remove `PREDEFINED_ATTRIBUTES` and `standardAttribute(Utf8Entry name)` static mapping method from the `Attributes` API class. > > Please let me know your comments or objections and please review the [PR](https://github.com/openjdk/jdk/pull/19006) and [CSR](https://bugs.openjdk.org/browse/JDK-8331414), so we can make it into 23. > > Thank you, > Adam Adam Sotona has updated the pull request incrementally with one additional commit since the last revision: fixed tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19006/files - new: https://git.openjdk.org/jdk/pull/19006/files/dcbaae85..b4203cfd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19006&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19006&range=06-07 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19006.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19006/head:pull/19006 PR: https://git.openjdk.org/jdk/pull/19006 From vromero at openjdk.org Mon May 6 18:35:57 2024 From: vromero at openjdk.org (Vicente Romero) Date: Mon, 6 May 2024 18:35:57 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v7] In-Reply-To: <_5Ike3ZDfok-lU5AItq7mDu80Gme4vvRrmvpovOOXHg=.763c4a63-7dff-49f5-b826-93d727e9f5b9@github.com> References: <_5Ike3ZDfok-lU5AItq7mDu80Gme4vvRrmvpovOOXHg=.763c4a63-7dff-49f5-b826-93d727e9f5b9@github.com> Message-ID: On Mon, 6 May 2024 16:07:26 GMT, Adam Sotona wrote: >> Hi, >> During performance optimization work on Class-File API as JDK lambda generator we found some static initialization killers. >> One of them is `java.lang.classfile.Attributes` with tens of static fields initialized with individual attribute mappers, and common set of all mappers, and static map from attribute names to the mappers. >> >> I propose to turn all the static fields into lazy-initialized static methods and remove `PREDEFINED_ATTRIBUTES` and `standardAttribute(Utf8Entry name)` static mapping method from the `Attributes` API class. >> >> Please let me know your comments or objections and please review the [PR](https://github.com/openjdk/jdk/pull/19006) and [CSR](https://bugs.openjdk.org/browse/JDK-8331414), so we can make it into 23. >> >> Thank you, >> Adam > > Adam Sotona has updated the pull request incrementally with one additional commit since the last revision: > > fixed tests lgtm src/java.base/share/classes/java/lang/classfile/Attributes.java line 28: > 26: > 27: import java.lang.classfile.attribute.*; > 28: import jdk.internal.classfile.impl.AbstractAttributeMapper.*; the second star import is probably unnecessary ------------- Marked as reviewed by vromero (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19006#pullrequestreview-2041378994 PR Review Comment: https://git.openjdk.org/jdk/pull/19006#discussion_r1591377928 From asotona at openjdk.org Mon May 6 18:46:54 2024 From: asotona at openjdk.org (Adam Sotona) Date: Mon, 6 May 2024 18:46:54 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v7] In-Reply-To: References: <_5Ike3ZDfok-lU5AItq7mDu80Gme4vvRrmvpovOOXHg=.763c4a63-7dff-49f5-b826-93d727e9f5b9@github.com> Message-ID: <2mvx1CG4RndVRqr8H_uypWn0S97bZ1qXXTWvVFsESz0=.23f7434a-d6ea-4208-9d49-d03f07c9e9b3@github.com> On Mon, 6 May 2024 18:07:06 GMT, Vicente Romero wrote: >> Adam Sotona has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed tests > > src/java.base/share/classes/java/lang/classfile/Attributes.java line 28: > >> 26: >> 27: import java.lang.classfile.attribute.*; >> 28: import jdk.internal.classfile.impl.AbstractAttributeMapper.*; > > the second star import is probably unnecessary Thank you for the review! All the holders/mappers implementations are AbstractAttributeMapper inner classes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19006#discussion_r1591416140 From cslucas at openjdk.org Mon May 6 21:08:01 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 6 May 2024 21:08:01 GMT Subject: RFR: JDK-8330565 - C2: Multiple crashes with CTW after JDK-8316991 Message-ID: Please consider this patch for fixing issues described in JDK-8330565 with a little overlap with issue JDK-8330795. The `# assert(false) failed: Bad graph detected in build_loop_late` failure was caused because a string concatenation optimization using [this method](https://github.com/openjdk/jdk/blob/819f3d6fc70ff6fe54ac5f9033c17c3dd4326aa5/src/hotspot/share/opto/graphKit.cpp#L4115) adds AddP and LoadN nodes to IR graph as NotNull _and_ because RAM was not "nullyfing" phis merging nullable pointers. I was only able to reproduce this problem using a classfile/jar compiled using an "old" version of JDK.. because newer version use InvokeDynamic to do string concatenation. The `assert(adr_t->is_known_instance_field()) failed: instance required` failure was caused because RAM uses `PhaseMacroExpand::can_eliminate_allocation` to check if an allocation can be eliminated and that method wasn't checking that the allocation uses an exact type or not. The `assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type` failure was caused by the fact that we didn't have a "zero value" for the type T_METADATA. The RAM patch uses that data when it creates a Phi node merging Klass loads and UseCompressedClassPointers is disabled. Tested with JTREG tier1-4 on Linux x86_64 & ARM64. ------------- Commit messages: - Fix bad type when UseCompressedPointers is disabled. - Phi merging nullable inputs needs to be nullable. - SR allocate needs to be of exact type. Changes: https://git.openjdk.org/jdk/pull/19111/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19111&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330565 Stats: 20 lines in 3 files changed: 20 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19111.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19111/head:pull/19111 PR: https://git.openjdk.org/jdk/pull/19111 From kvn at openjdk.org Mon May 6 21:53:53 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 6 May 2024 21:53:53 GMT Subject: RFR: JDK-8330565 - C2: Multiple crashes with CTW after JDK-8316991 In-Reply-To: References: Message-ID: On Mon, 6 May 2024 21:02:07 GMT, Cesar Soares Lucas wrote: > Please consider this patch for fixing issues described in JDK-8330565 with a little overlap with issue JDK-8330795. > > The `# assert(false) failed: Bad graph detected in build_loop_late` failure was caused because a string concatenation optimization using [this method](https://github.com/openjdk/jdk/blob/819f3d6fc70ff6fe54ac5f9033c17c3dd4326aa5/src/hotspot/share/opto/graphKit.cpp#L4115) adds AddP and LoadN nodes to IR graph as NotNull _and_ because RAM was not "nullyfing" phis merging nullable pointers. I was only able to reproduce this problem using a classfile/jar compiled using an "old" version of JDK.. because newer version use InvokeDynamic to do string concatenation. > > The `assert(adr_t->is_known_instance_field()) failed: instance required` failure was caused because RAM uses `PhaseMacroExpand::can_eliminate_allocation` to check if an allocation can be eliminated and that method wasn't checking that the allocation uses an exact type or not. > > The `assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type` failure was caused by the fact that we didn't have a "zero value" for the type T_METADATA. The RAM patch uses that data when it creates a Phi node merging Klass loads and UseCompressedClassPointers is disabled. > > > > > Tested with JTREG tier1-4 on Linux x86_64 & ARM64. src/hotspot/share/opto/macro.cpp line 578: > 576: } else if (!res_type->klass_is_exact()) { > 577: NOT_PRODUCT(fail_eliminate = "Not an exact type.";) > 578: can_eliminate = false; You already fixed this: [#18851](https://github.com/openjdk/jdk/pull/18851) Please, merge latest changes into this PR. Also you need new regression tests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19111#discussion_r1591593729 From cslucas at openjdk.org Mon May 6 22:19:26 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 6 May 2024 22:19:26 GMT Subject: RFR: JDK-8330565 - C2: Multiple crashes with CTW after JDK-8316991 [v2] In-Reply-To: References: Message-ID: > Please consider this patch for fixing issues described in JDK-8330565 with a little overlap with issue JDK-8330795. > > The `# assert(false) failed: Bad graph detected in build_loop_late` failure was caused because a string concatenation optimization using [this method](https://github.com/openjdk/jdk/blob/819f3d6fc70ff6fe54ac5f9033c17c3dd4326aa5/src/hotspot/share/opto/graphKit.cpp#L4115) adds AddP and LoadN nodes to IR graph as NotNull _and_ because RAM was not "nullyfing" phis merging nullable pointers. I was only able to reproduce this problem using a classfile/jar compiled using an "old" version of JDK.. because newer version use InvokeDynamic to do string concatenation. > > The `assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type` failure was caused by the fact that we didn't have a "zero value" for the type T_METADATA. The RAM patch uses that data when it creates a Phi node merging Klass loads and UseCompressedClassPointers is disabled. > > > > > Tested with JTREG tier1-4 on Linux x86_64 & ARM64. Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Updating branch Merge branch 'fix_bad_bad_graph' of https://github.com/JohnTortugo/jdk into fix_bad_bad_graph - Fix bad type when UseCompressedPointers is disabled. - Phi merging nullable inputs needs to be nullable. - SR allocate needs to be of exact type. - Fix bad type when UseCompressedPointers is disabled. - Phi merging nullable inputs needs to be nullable. - SR allocate needs to be of exact type. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19111/files - new: https://git.openjdk.org/jdk/pull/19111/files/31829d60..c8ce1502 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19111&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19111&range=00-01 Stats: 48711 lines in 1737 files changed: 22525 ins; 21107 del; 5079 mod Patch: https://git.openjdk.org/jdk/pull/19111.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19111/head:pull/19111 PR: https://git.openjdk.org/jdk/pull/19111 From cslucas at openjdk.org Mon May 6 22:19:26 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 6 May 2024 22:19:26 GMT Subject: Withdrawn: JDK-8330565 - C2: Multiple crashes with CTW after JDK-8316991 In-Reply-To: References: Message-ID: On Mon, 6 May 2024 21:02:07 GMT, Cesar Soares Lucas wrote: > Please consider this patch for fixing issues described in JDK-8330565 with a little overlap with issue JDK-8330795. > > The `# assert(false) failed: Bad graph detected in build_loop_late` failure was caused because a string concatenation optimization using [this method](https://github.com/openjdk/jdk/blob/819f3d6fc70ff6fe54ac5f9033c17c3dd4326aa5/src/hotspot/share/opto/graphKit.cpp#L4115) adds AddP and LoadN nodes to IR graph as NotNull _and_ because RAM was not "nullyfing" phis merging nullable pointers. I was only able to reproduce this problem using a classfile/jar compiled using an "old" version of JDK.. because newer version use InvokeDynamic to do string concatenation. > > The `assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type` failure was caused by the fact that we didn't have a "zero value" for the type T_METADATA. The RAM patch uses that data when it creates a Phi node merging Klass loads and UseCompressedClassPointers is disabled. > > > > > Tested with JTREG tier1-4 on Linux x86_64 & ARM64. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/19111 From sviswanathan at openjdk.org Mon May 6 22:43:57 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 6 May 2024 22:43:57 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: <8Y-nIHc8vfB1X_hp3tpqqqgpCzu6dAt6BBIP_zc4Q70=.c9a48c68-8c14-4af9-8357-ab50e62a5fd3@github.com> On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Rearrange; add lambdas for clarity src/hotspot/cpu/x86/macroAssembler_x86.cpp line 1174: > 1172: // Alignment specifying the maximum number of allowed bytes to pad. > 1173: // If padding > max, no padding is inserted. > 1174: void MacroAssembler::p2align(int modulus, int maxbytes) { We could pass offset() as an argument to p2align. Basically have three arguments to p2align(modulus, target, maxbytes). Also maybe rename p2align as align then? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 208: > 206: //////////////////////////////////////////////////////////////////////////////////////// > 207: //////////////////////////////////////////////////////////////////////////////////////// > 208: if (VM_Version::supports_avx2()) { // AVX2 version Instead of the if check here, it would be better to do an assert here: assert (VM_Version::supports_avx2(), "Needs AVX2 support"); src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 233: > 231: //////////////////////////////////////////////////////////////////////////////////////// > 232: //////////////////////////////////////////////////////////////////////////////////////// > 233: This comment can go right before the method start. Also good to add in the comment the native function parameters. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 238: > 236: const Register needle = rdx; > 237: const Register needle_len = rcx; > 238: This is the calling convention on Linux. How is windows platform handled? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 260: > 258: // const XMMRegister save_rcx = xmm11; > 259: // const XMMRegister save_r8 = xmm12; > 260: This could be removed? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 279: > 277: fnptrs[isLL ? StrIntrinsicNode::LL > 278: : isUU ? StrIntrinsicNode::UU > 279: : StrIntrinsicNode::UL] = __ pc(); Could this not be simplified as: fnptrs[ae] = __ pc(); src/hotspot/share/opto/library_call.cpp line 1263: > 1261: if (result != nullptr) { > 1262: // The result is index relative to from_index if substring was found, -1 otherwise. > 1263: // Generate code which will fold into cmove. Any reason to remove this comment? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591547667 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591612417 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591613215 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591617528 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591607921 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591618222 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591554296 From bkilambi at openjdk.org Mon May 6 23:01:56 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 6 May 2024 23:01:56 GMT Subject: Integrated: 8331400: AArch64: Sync aarch64_vector.ad with aarch64_vector_ad.m4 In-Reply-To: References: Message-ID: On Fri, 3 May 2024 09:07:25 GMT, Bhavana Kilambi wrote: > This commit - [1] modified the aarch64_vector.ad directly. This patch includes that change in the aarch64_vector_ad.m4 file as well and generates the aarch64_vector.ad file from it. > > [1] https://github.com/openjdk/jdk/commit/185e711bfe4c4d013b56e867f85cfb4177b3a2cf This pull request has now been integrated. Changeset: f308e107 Author: Bhavana Kilambi Committer: Eric Liu URL: https://git.openjdk.org/jdk/commit/f308e107ce8b993641ee3d0a0d5d52bf5cd3b94e Stats: 10 lines in 2 files changed: 6 ins; 2 del; 2 mod 8331400: AArch64: Sync aarch64_vector.ad with aarch64_vector_ad.m4 Reviewed-by: aph, kvn, eliu ------------- PR: https://git.openjdk.org/jdk/pull/19077 From sviswanathan at openjdk.org Mon May 6 23:21:57 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 6 May 2024 23:21:57 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Rearrange; add lambdas for clarity src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 314: > 312: > 313: // needle_len is in elements, not bytes, for UTF-16 > 314: __ cmpq(needle_len, isUU ? OPT_NEEDLE_SIZE_MAX / 2 : OPT_NEEDLE_SIZE_MAX); OPT_NEEDLE_SIZE_MAX is an odd number (set to 5), should that have been an even number? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 329: > 327: //////////////////////////////////////////////////////////////////////////////////////// > 328: > 329: __ bind(L_begin); So far we have handled haystack <= 32 and needle_size <= 5 (?) in bytes. A high level algorithm description here is needed in comments to follow the code below. A description of what are the various paths in terms of haystack and needle sizes and how to reason the assembly code below and make sure that all the paths are taken care of. Also the abstraction level suddenly changes here to detailed code below instead of methods for the various paths. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591640551 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591646095 From stuefe at openjdk.org Tue May 7 04:27:12 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 7 May 2024 04:27:12 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v11] In-Reply-To: References: Message-ID: <0anrYmEFTzUaEynG83xqh3DlAygkKXw9BTxO982PkR4=.7a8d0d3d-168e-47eb-8385-79d4a9c46df3@github.com> > See [1] for previous discussions. > > We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. > > The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. > > Examples: > > This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` > > This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` > > > --- > > The patch: > > 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. > 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. > 3) Adapted and extended tests > > I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. > > > Tested: > > - manually on Mac m1 (debug and release) > - GHAs are running > - but Oracle will do more testing before this goes in > > [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: - remove debug output - Merge branch 'master' into compiler-default-limit - fix compiler.c2.TestFindNode again - merge master and fix conflicts - Remove unused variable - Remove accidental change to TestDeadPhiMergeMemLoop.java - fix copyrights - fix copyrights - another fix - fix accidental slip in of another test name - ... and 9 more: https://git.openjdk.org/jdk/compare/f308e107...61dc5952 ------------- Changes: https://git.openjdk.org/jdk/pull/18969/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18969&range=10 Stats: 166 lines in 7 files changed: 115 ins; 12 del; 39 mod Patch: https://git.openjdk.org/jdk/pull/18969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18969/head:pull/18969 PR: https://git.openjdk.org/jdk/pull/18969 From stuefe at openjdk.org Tue May 7 04:27:12 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 7 May 2024 04:27:12 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v10] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 18:29:20 GMT, Vladimir Kozlov wrote: > `-XX:CompileCommand=memstat,compiler.c2.TestFindNode::*,print` - leftover from debugging? I tend to leave debug output in, if its not too large, to speed up any follow-up fixes I need to do later. But then, I did not do it consistently anyway, so I removed the output. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2097419843 From epeter at openjdk.org Tue May 7 05:43:07 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 May 2024 05:43:07 GMT Subject: RFR: 8331085: Crash in MergePrimitiveArrayStores::is_compatible_store() Message-ID: In the `MergeStore` logic, I check the `adr_type()`. But in some rare cases this can be a `nullptr`, I did not expect that. Exampe: during IGVN, the address is dying, with TOP somewhere in the inputs. 1 Con === 0 [[ ]] #top 1022 AddP === _ 1 1 41 [[ 1019 1021 ]] !orig=539,[572] !jvms: Test::dMeth @ bci:223 (line 35) 1019 StoreI === 1128 827 1022 1020 [[ 1075 541 1073 574 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; Memory: @null !orig=574,1068 !jvms: Test::dMeth @ bci:227 (line 35) I now check for `nullptr`. ------------- Commit messages: - 8331085 Changes: https://git.openjdk.org/jdk/pull/19103/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19103&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331085 Stats: 64 lines in 2 files changed: 63 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19103/head:pull/19103 PR: https://git.openjdk.org/jdk/pull/19103 From thartmann at openjdk.org Tue May 7 05:59:52 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 7 May 2024 05:59:52 GMT Subject: RFR: 8331085: Crash in MergePrimitiveArrayStores::is_compatible_store() In-Reply-To: References: Message-ID: On Mon, 6 May 2024 11:25:22 GMT, Emanuel Peter wrote: > In the `MergeStore` logic, I check the `adr_type()`. But in some rare cases this can be a `nullptr`, I did not expect that. > > Exampe: during IGVN, the address is dying, with TOP somewhere in the inputs. > > 1 Con === 0 [[ ]] #top > 1022 AddP === _ 1 1 41 [[ 1019 1021 ]] !orig=539,[572] !jvms: Test::dMeth @ bci:223 (line 35) > 1019 StoreI === 1128 827 1022 1020 [[ 1075 541 1073 574 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; Memory: @null !orig=574,1068 !jvms: Test::dMeth @ bci:227 (line 35) > > I now check for `nullptr`. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19103#pullrequestreview-2042128740 From jbhateja at openjdk.org Tue May 7 06:12:58 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 7 May 2024 06:12:58 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v14] In-Reply-To: <727FyZHyBbtRilYRtbP2E4dbZYqj9a-QgXAuicQ2iZQ=.01035706-6591-4df5-bf7d-d7a2f6209015@github.com> References: <727FyZHyBbtRilYRtbP2E4dbZYqj9a-QgXAuicQ2iZQ=.01035706-6591-4df5-bf7d-d7a2f6209015@github.com> Message-ID: On Fri, 3 May 2024 19:14:07 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > revert unneeded legacy flag change for kmovwl(K,K) and kmovql(K,K) src/hotspot/cpu/x86/assembler_x86.cpp line 11754: > 11752: > 11753: // This is a 4 byte encoding > 11754: void Assembler::evex_prefix(bool vex_r, bool vex_b, bool vex_x, bool evex_r, bool evex_b, bool evex_v, Suggestion: void Assembler::evex_prefix(bool vex_r, bool vex_b, bool vex_x, bool evex_r, bool eevex_b, bool evex_v, src/hotspot/cpu/x86/assembler_x86.cpp line 11766: > 11764: // P0: byte 2, initialized to RXBR`00mm > 11765: // instead of not'd > 11766: int byte2 = (vex_r ? VEX_R : 0) | (vex_x ? VEX_X : 0) | (vex_b ? VEX_B : 0) | (evex_r ? EVEX_Rb : 0); Comment at [L#11765 ](https://github.com/openjdk/jdk/pull/18476/files#diff-e3576e9c22db89236cdb906f032ff00748ff6d1c21b05277d991d80af75daf3aL11686) `// P0: byte 2, initialized to RXBR'00mm => // P0: byte 2, initialized to RXBR'0mmm` src/hotspot/cpu/x86/assembler_x86.cpp line 11768: > 11766: int byte2 = (vex_r ? VEX_R : 0) | (vex_x ? VEX_X : 0) | (vex_b ? VEX_B : 0) | (evex_r ? EVEX_Rb : 0); > 11767: byte2 = (~byte2) & 0xF0; > 11768: byte2 |= evex_b ? EEVEX_B : 0; Suggestion: byte2 |= eevex_b ? EEVEX_B : 0; This corresponds to B4 bit which is specific to EEVEX encoding. src/hotspot/cpu/x86/assembler_x86.cpp line 11846: > 11844: } > 11845: bool eevex_x = adr.index_needs_rex2(); > 11846: bool evex_b = adr.base_needs_rex2(); Suggestion: bool eevex_b = adr.base_needs_rex2(); src/hotspot/cpu/x86/assembler_x86.cpp line 11848: > 11846: bool evex_b = adr.base_needs_rex2(); > 11847: attributes->set_is_evex_instruction(); > 11848: evex_prefix(vex_r, vex_b, vex_x, evex_r, evex_b, evex_v, eevex_x, nds_enc, pre, opc); Suggestion: evex_prefix(vex_r, vex_b, vex_x, evex_r, eevex_b, evex_v, eevex_x, nds_enc, pre, opc); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1591847091 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1591858904 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1591846721 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1591848768 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1591848945 From chagedorn at openjdk.org Tue May 7 06:25:52 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 7 May 2024 06:25:52 GMT Subject: RFR: 8331085: Crash in MergePrimitiveArrayStores::is_compatible_store() In-Reply-To: References: Message-ID: On Mon, 6 May 2024 11:25:22 GMT, Emanuel Peter wrote: > In the `MergeStore` logic, I check the `adr_type()`. But in some rare cases this can be a `nullptr`, I did not expect that. > > Exampe: during IGVN, the address is dying, with TOP somewhere in the inputs. > > 1 Con === 0 [[ ]] #top > 1022 AddP === _ 1 1 41 [[ 1019 1021 ]] !orig=539,[572] !jvms: Test::dMeth @ bci:223 (line 35) > 1019 StoreI === 1128 827 1022 1020 [[ 1075 541 1073 574 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; Memory: @null !orig=574,1068 !jvms: Test::dMeth @ bci:227 (line 35) > > I now check for `nullptr`. Looks good! test/hotspot/jtreg/compiler/c2/TestMergeStoresNullAdrType.java line 33: > 31: * -XX:-TieredCompilation -Xcomp > 32: * -XX:+UnlockDiagnosticVMOptions -XX:+StressIGVN -XX:+StressCCP > 33: * -XX:RepeatCompilation=1000 Is it really worth to have such a high count? Eventually, it would trigger the bug if the test is executed enough times. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19103#pullrequestreview-2042163459 PR Review Comment: https://git.openjdk.org/jdk/pull/19103#discussion_r1591872327 From duke at openjdk.org Tue May 7 06:26:04 2024 From: duke at openjdk.org (Daniel Skantz) Date: Tue, 7 May 2024 06:26:04 GMT Subject: RFR: 8330016: Stress seed should be initialized for runtime stub compilation [v2] In-Reply-To: <-Pj7vK6Wpgv3FnP2KZXB2so2wWraG3LWEKds9XDI3pY=.d6595586-aa83-47bc-84ed-ff8c4a1a5550@github.com> References: <-Pj7vK6Wpgv3FnP2KZXB2so2wWraG3LWEKds9XDI3pY=.d6595586-aa83-47bc-84ed-ff8c4a1a5550@github.com> Message-ID: > We can initialize the stress seed for runtime stub compilation as we already do for method compilation. This found the bug described in JDK-8329258. It would apply if StressGCM or StressLCM vm flags are set. > > Testing: T1-5 default options. T1-5 with -XX:+StressLCM and -XX:+StressGCM. Manually tested that the stress seed is set and printed to compilation log if either stress option is set. Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: const ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19095/files - new: https://git.openjdk.org/jdk/pull/19095/files/6ccef597..a018fde0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19095&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19095&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19095.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19095/head:pull/19095 PR: https://git.openjdk.org/jdk/pull/19095 From epeter at openjdk.org Tue May 7 07:10:59 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 May 2024 07:10:59 GMT Subject: RFR: 8331085: Crash in MergePrimitiveArrayStores::is_compatible_store() In-Reply-To: References: Message-ID: On Tue, 7 May 2024 06:23:41 GMT, Christian Hagedorn wrote: >> In the `MergeStore` logic, I check the `adr_type()`. But in some rare cases this can be a `nullptr`, I did not expect that. >> >> Exampe: during IGVN, the address is dying, with TOP somewhere in the inputs. >> >> 1 Con === 0 [[ ]] #top >> 1022 AddP === _ 1 1 41 [[ 1019 1021 ]] !orig=539,[572] !jvms: Test::dMeth @ bci:223 (line 35) >> 1019 StoreI === 1128 827 1022 1020 [[ 1075 541 1073 574 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; Memory: @null !orig=574,1068 !jvms: Test::dMeth @ bci:227 (line 35) >> >> I now check for `nullptr`. > > Looks good! Thanks @chhagedorn @TobiHartmann for the reviews! Since this is rather a simple fix and creates a bit of noise in the testing pipeline, I'm already integrating now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19103#issuecomment-2097598560 From epeter at openjdk.org Tue May 7 07:11:01 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 May 2024 07:11:01 GMT Subject: Integrated: 8331085: Crash in MergePrimitiveArrayStores::is_compatible_store() In-Reply-To: References: Message-ID: <_6IPKwm8rYrQGoHgEVMl1dHlguKkHKXiFAWH_g3ZukU=.c2d01173-0f7a-4f5d-90bf-3ffac849e07d@github.com> On Mon, 6 May 2024 11:25:22 GMT, Emanuel Peter wrote: > In the `MergeStore` logic, I check the `adr_type()`. But in some rare cases this can be a `nullptr`, I did not expect that. > > Exampe: during IGVN, the address is dying, with TOP somewhere in the inputs. > > 1 Con === 0 [[ ]] #top > 1022 AddP === _ 1 1 41 [[ 1019 1021 ]] !orig=539,[572] !jvms: Test::dMeth @ bci:223 (line 35) > 1019 StoreI === 1128 827 1022 1020 [[ 1075 541 1073 574 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; Memory: @null !orig=574,1068 !jvms: Test::dMeth @ bci:227 (line 35) > > I now check for `nullptr`. This pull request has now been integrated. Changeset: df1ff056 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/df1ff056f19ce569e05b0b87584e289840fc5d5c Stats: 64 lines in 2 files changed: 63 ins; 0 del; 1 mod 8331085: Crash in MergePrimitiveArrayStores::is_compatible_store() Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/19103 From rcastanedalo at openjdk.org Tue May 7 07:27:52 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 7 May 2024 07:27:52 GMT Subject: RFR: 8330016: Stress seed should be initialized for runtime stub compilation [v2] In-Reply-To: References: <-Pj7vK6Wpgv3FnP2KZXB2so2wWraG3LWEKds9XDI3pY=.d6595586-aa83-47bc-84ed-ff8c4a1a5550@github.com> Message-ID: On Tue, 7 May 2024 06:26:04 GMT, Daniel Skantz wrote: >> We can initialize the stress seed for runtime stub compilation as we already do for method compilation. This found the bug described in JDK-8329258. It would apply if StressGCM or StressLCM vm flags are set. >> >> Testing: T1-5 default options. T1-5 with -XX:+StressLCM and -XX:+StressGCM. Manually tested that the stress seed is set and printed to compilation log if either stress option is set. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > const Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19095#pullrequestreview-2042291021 From duke at openjdk.org Tue May 7 09:08:55 2024 From: duke at openjdk.org (Daniel Skantz) Date: Tue, 7 May 2024 09:08:55 GMT Subject: RFR: 8330016: Stress seed should be initialized for runtime stub compilation [v2] In-Reply-To: References: <-Pj7vK6Wpgv3FnP2KZXB2so2wWraG3LWEKds9XDI3pY=.d6595586-aa83-47bc-84ed-ff8c4a1a5550@github.com> Message-ID: <1bLGo6lP0M20Q86lj4d3EsYy0SSlpgqeRiivad8PRfo=.4ba1d788-5e47-4692-96cd-cec61faec6df@github.com> On Tue, 7 May 2024 06:26:04 GMT, Daniel Skantz wrote: >> We can initialize the stress seed for runtime stub compilation as we already do for method compilation. This found the bug described in JDK-8329258. It would apply if StressGCM or StressLCM vm flags are set. >> >> Testing: T1-5 default options. T1-5 with -XX:+StressLCM and -XX:+StressGCM. Manually tested that the stress seed is set and printed to compilation log if either stress option is set. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > const Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19095#issuecomment-2097810711 From snazarki at openjdk.org Tue May 7 10:00:52 2024 From: snazarki at openjdk.org (Sergey Nazarkin) Date: Tue, 7 May 2024 10:00:52 GMT Subject: RFR: 8330806: test/hotspot/jtreg/compiler/c1/TestLargeMonitorOffset.java fails on ARM32 In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 14:21:09 GMT, Aleksei Voitylov wrote: > TestLargeMonitorOffset was introduced by 8310844 with a fix for the AArch64 platform. The same issue needs to be fixed for ARM32. With this change, we add the large slot_offset handling to the ARM32 version of IR_Assembler::osr_entry(). > > Testing: jtreg hotspot, jtreg jdk tier1-3. I've checked the patch (one may need to use a [workaround](https://bugs.openjdk.org/browse/JDK-8316395) ). The JDK crashes without the patch, and passes with the patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18891#issuecomment-2097916569 From epeter at openjdk.org Tue May 7 11:15:14 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 May 2024 11:15:14 GMT Subject: RFR: 8331764: C2 SuperWord: refactor _align_to_ref/_mem_ref_for_main_loop_alignment Message-ID: This PR accomplishes these things: - Rename `_align_to_ref` -> `_mem_ref_for_main_loop_alignment`. - Move the `mem_ref` finding for alignment out of `SuperWord::find_adjacent_refs`. This is too early, and we don't even know if the relevant `mem_ref` is going to be vectorized. It makes more sense to pick a `mem_ref` directly in `SuperWord::adjust_pre_loop_limit_to_align_main_loop_vectors`, where we already know what packs are going to be vectorized. - For the alignment width (aw), we can use the `vector_width` of the pack to which the `mem_ref` belongs, rather than the potentially much larger `vector_width_in_bytes`. I track this with `_aw_for_main_loop_alignment` now. I need this for https://github.com/openjdk/jdk/pull/18822, and decided to split it out into an independent change. ------------- Commit messages: - 8331764 Changes: https://git.openjdk.org/jdk/pull/19115/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19115&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331764 Stats: 67 lines in 2 files changed: 41 ins; 20 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19115/head:pull/19115 PR: https://git.openjdk.org/jdk/pull/19115 From epeter at openjdk.org Tue May 7 12:47:01 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 May 2024 12:47:01 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v6] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 15:20:32 GMT, Damon Fenacci wrote: >> # Issue >> When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. >> >> # Causes >> On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. >> The same is true for `StoreVector`s. >> When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 >> >> where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor offsets interfere. >> Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 >> >> but we don?t make sure that there are no masks or offsets. >> A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and offsets here too but in this case we can include these cases if the masks and offsets of the vector stores are equivalent. >> >> # Solution >> To avoid folding `Load`- and `StoreVector`s with masks and offsets we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns its own (to avoid ever being the same as a store node). In this way, the checks in `MemNode::can_see_stored_value` >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 >> >> and `StoreNode::Identity` >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 >> >> will fail if masks or offsets are used. >> For 2 stores of the same value we instead check for mask and offset equality. >> >> Regression tests for... > > Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: > > - JDK-8325520: add store/load masked vector tests > - JDK-8325520: add store/load tests with duplicate offsets Nice, looks much better, I think the VM code is now correct. A few suggestions for code style. src/hotspot/share/opto/memnode.cpp line 1169: > 1167: // LoadVector/StoreVector need additional checks > 1168: if (st->is_StoreVector()) { > 1169: // Ensure that types match To reduce noise, you could revert these comment changes, up to you. src/hotspot/share/opto/memnode.cpp line 3518: > 3516: mem->in(MemNode::ValueIn)->eqv_uncast(val) && > 3517: mem->Opcode() == Opcode()) { > 3518: // Not a vector Suggestion: Redundant comment, the next line literally says as much ;) src/hotspot/share/opto/memnode.cpp line 3546: > 3544: const StoreVectorScatterMaskedNode* svgm = mem->as_StoreVectorScatterMasked(); > 3545: if (offsets->eqv_uncast(svgm->in(StoreVectorScatterMaskedNode::Offsets)) && > 3546: mask->eqv_uncast(svgm->in(StoreVectorScatterMaskedNode::Mask))) { Suggestion: mask->eqv_uncast(svgm->in(StoreVectorScatterMaskedNode::Mask))) { src/hotspot/share/opto/memnode.cpp line 3551: > 3549: // Regular store (no offsets or mask) > 3550: } else { > 3551: result = mem; Suggestion: assert(Opcode() = Op_StoreVector, "just a plain vector store, no offset or mask"); result = mem; Turning comments into asserts is preferable I would say. src/hotspot/share/opto/memnode.cpp line 3554: > 3552: } > 3553: } > 3554: } I think the code is now correct. But I find the nested if-elseif-elseif-else ... structure a bit hard to read. And there is quite some code duplication (e.g. `result = mem` and all the `eqv_uncast` checks). You could either do something like this: if (!is_StoreVector() || as_StoreVector()->has_same_vect_type_and_offsets_and_mask(mem->as_StoreVector())) { result = mem; } Sketch: has_same_vect_type_and_offsets_and_mask: different vect_type -> return false ... Or maybe it would be better to define virtual functions to get the `mask` and `offsets` from a `StoreVector`? If it has none, just return `nullptr`. Sometimes people worry about virtual methods, but we already use them extensively for the node Value/Ideal anyway. Then, you can do: if (!is_StoreVector()) { result = mem; } else { const Node* offsets1 = as_StoreVector()->get_offsets(); const Node* offsets2 = mem->as_StoreVector()->get_offsets(); const Node* mask1 = as_StoreVector()->get_mask(); const Node* mask2 = mem->as_StoreVector()->get_mask(); if (offsets1->eqv_uncast(offsets2) && offsets1->eqv_uncast(offsets2)) { result = mem; } } I think that would be the cleanest and most readable way. What do you think? ------------- PR Review: https://git.openjdk.org/jdk/pull/18347#pullrequestreview-2043033219 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1592390146 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1592392528 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1592399071 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1592396514 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1592417634 From epeter at openjdk.org Tue May 7 12:47:02 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 May 2024 12:47:02 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v6] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 12:28:32 GMT, Emanuel Peter wrote: >> Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: >> >> - JDK-8325520: add store/load masked vector tests >> - JDK-8325520: add store/load tests with duplicate offsets > > src/hotspot/share/opto/memnode.cpp line 3546: > >> 3544: const StoreVectorScatterMaskedNode* svgm = mem->as_StoreVectorScatterMasked(); >> 3545: if (offsets->eqv_uncast(svgm->in(StoreVectorScatterMaskedNode::Offsets)) && >> 3546: mask->eqv_uncast(svgm->in(StoreVectorScatterMaskedNode::Mask))) { > > Suggestion: > > mask->eqv_uncast(svgm->in(StoreVectorScatterMaskedNode::Mask))) { Alignment was off ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1592399246 From epeter at openjdk.org Tue May 7 12:59:57 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 May 2024 12:59:57 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v4] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 15:20:23 GMT, Damon Fenacci wrote: >> Nice, ah you are right, there can be issues with mask-only cases as well! >> >> It would be great if you had tests that exactly exercise these "bad" examples, where it looks like we might optimize, but it would be wrong. >> >> I'll look at your `store_Opcode` changes now... > >> It would be great if you had tests that exactly exercise these "bad" examples, where it looks like we might optimize, but it would be wrong. > > Yep, good idea. I've added a few tests to check for those cases (load-store with duplicate offsets and store-load with masks). Thanks @eme64! @dafedafe I also scanned quickly over the regression tests. I see at least two aspects missing: - No mixed type test for load-store: Use MemorySegment `from/intoMmemorySegment`. Try something like store a int-vector, and load a float-vector. - Mismatched vector length: store a vector of length 4, and load one of length 8. I think all of these are currently correctly handled by your `vect_type` checks in the VM code, but it would be good to see that they are covered by regression tests, in case someone messes this up in the future. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18347#issuecomment-2098345482 From epeter at openjdk.org Tue May 7 13:03:55 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 May 2024 13:03:55 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v6] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 15:20:32 GMT, Damon Fenacci wrote: >> # Issue >> When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. >> >> # Causes >> On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. >> The same is true for `StoreVector`s. >> When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 >> >> where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor offsets interfere. >> Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 >> >> but we don?t make sure that there are no masks or offsets. >> A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and offsets here too but in this case we can include these cases if the masks and offsets of the vector stores are equivalent. >> >> # Solution >> To avoid folding `Load`- and `StoreVector`s with masks and offsets we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns its own (to avoid ever being the same as a store node). In this way, the checks in `MemNode::can_see_stored_value` >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 >> >> and `StoreNode::Identity` >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 >> >> will fail if masks or offsets are used. >> For 2 stores of the same value we instead check for mask and offset equality. >> >> Regression tests for... > > Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: > > - JDK-8325520: add store/load masked vector tests > - JDK-8325520: add store/load tests with duplicate offsets Ah, some more missing cases: - Do some store-store and store-load cases where you the first and second are different loads/stores, i.e. one with and one without mask/offsets. E.g. `StoreVectorMasked` and `StoreVectorScatter` in a store-store test. Doing the total cross-product is probably too much, but a few examples would be a good start. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18347#issuecomment-2098354283 From epeter at openjdk.org Tue May 7 13:24:02 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 May 2024 13:24:02 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v8] In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 12:52:15 GMT, Bhavana Kilambi wrote: >> Floating-point addition is non-associative, that is adding floating-point elements in arbitrary order may get different value. Specially, Vector API does not define the order of reduction intentionally, which allows platforms to generate more efficient codes [1]. So that needs a node to represent non strictly-ordered add-reduction for floating-point type in C2. >> >> To avoid introducing new nodes, this patch adds a bool field in `AddReductionVF/D` to distinguish whether they require strict order. It also removes `UnorderedReductionNode` and adds a virtual function `bool requires_strict_order()` in `ReductionNode`. Besides `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` have a fixed value. >> >> With this patch, Vector API would always generate non strictly-ordered `AddReductionVF/D' on SVE machines with vector length <= 16B as it is more beneficial to generate non-strictly ordered instructions on such machines compared to strictly ordered ones. >> >> [AArch64] >> On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. Auto-vectorization has already banned these nodes in JDK-8275275 [2]. >> >> This patch adds matching rules for non strictly-ordered `AddReductionVF/D`. >> >> No effects on other platforms. >> >> [Performance] >> FloatMaxVector.ADDLanes [3] measures the performance of add reduction for floating-point type. With this patch, it improves ~3x on my SVE machine (128-bit). >> >> ADDLanes >> >> Benchmark Before After Unit >> FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms >> >> >> Final code is as below: >> >> Before: >> ` fadda z17.s, p7/m, z17.s, z16.s >> ` >> After: >> >> faddp v17.4s, v21.4s, v21.4s >> faddp s18, v17.2s >> fadd s18, s18, s19 >> >> >> >> >> [Test] >> Full jtreg passed on AArch64 and x86. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2529 >> [2] https://bugs.openjdk.org/browse/JDK-8275275 >> [3] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L316 > > Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge master > - Adjust format for the backend rules changed in previous commit > - Address some more review comments > - Revert to previous indentation > - Add comments, revert to requires_strict_order and other minor changes > - Naming changes: replace strict/non-strict with more technical terms > - Addressed review comments for changes in backend rules and code style > - 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction > > Floating-point addition is non-associative, that is adding > floating-point elements in arbitrary order may get different value. > Specially, Vector API does not define the order of reduction > intentionally, which allows platforms to generate more efficient codes > [1]. So that needs a node to represent non strictly-ordered > add-reduction for floating-point type in C2. > > To avoid introducing new nodes, this patch adds a bool field in > `AddReductionVF/D` to distinguish whether they require strict order. It > also removes `UnorderedReductionNode` and adds a virtual function > `bool requires_strict_order()` in `ReductionNode`. Besides > `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` > have a fixed value. > > With this patch, Vector API would always generate non strictly-ordered > `AddReductionVF/D' on SVE machines with vector length <= 16B as it is > more beneficial to generate non-strictly ordered instructions on such > machines compared to strictly ordered ones. > > [AArch64] > On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. > Auto-vectorization has already banned these nodes in JDK-8275275 [2]. > > This patch adds matching rules for non strictly-ordered > `AddReductionVF/D`. > > No effects on other platforms. > > [Performance] > FloatMaxVector.ADDLanes [3] measures the performance of add reduction > for floating-point type. With this patch, it improves ~3x on my SVE > machine (128-bit). > > ADDLanes > Benchmark Before After Unit > FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms > > Final code is as below: > > ``` > Before: > fadda z17.s, p7/m, z17.s, z16.s > > After: > faddp v17.4s, v21.4s,... I just realized that there is no regression test. And I think it would be nice to have one. Also, we should add some sort of message to the `dump` if the `ReductionNode` has the `requires_strict_order` on or off. I think that could be done in `dump_spec`. You could do it similar to: #ifndef PRODUCT void VectorMaskCmpNode::dump_spec(outputStream *st) const { st->print(" %d #", _predicate); _type->dump_on(st); } #endif // PRODUCT This would actually allow you to create a IR test! You would check that the AddReductionVNode is annotated correctly. You need some VectorAPI tests, and some SuperWord auto-vectorization tests. How does that sound? That would ensure that nobody can easily destroy your RFE, at least not in the IR. Sorry for the delay, I'm really excited about this one, just had to get some more critical things done recently ;) src/hotspot/cpu/aarch64/aarch64_vector.ad line 2907: > 2905: format %{ "reduce_addF_sve $dst_src1, $dst_src1, $src2" %} > 2906: ins_encode %{ > 2907: assert(UseSVE > 0, "must be sve"); Is there no way we would now run into this assert? static bool use_neon_for_vector(int vector_length_in_bytes) { return vector_length_in_bytes <= 16; } Does `vector_length_in_bytes > 16` imply that we have `UseSVE > 0`? ------------- PR Review: https://git.openjdk.org/jdk/pull/18034#pullrequestreview-2043144243 PR Comment: https://git.openjdk.org/jdk/pull/18034#issuecomment-2098395131 PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1592455614 From epeter at openjdk.org Tue May 7 13:46:56 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 May 2024 13:46:56 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v13] In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 11:38:27 GMT, Hamlin Li wrote: >> HI, >> Can you have a look at this patch adding some tests for Math.round instrinsics? >> Thanks! >> >> ### FYI: >> During the development of RoundVF/RoundF, we faced the issues which were only spotted by running test exhaustively against 32/64 bits range of int/long. >> It's helpful to add these exhaustive tests in jdk for future possible usage, rather than build it everytime when needed. >> Of course, we need to put it in `manual` mode, so it's not run when `-automatic` jtreg option is specified which I guess is the mode CI used, please correct me if I'm assume incorrectly. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > fix issues; modify vm options to make sure test the expected behaviors. Thanks for the extra tests! Can you measure how much time each test now takes on your machine? I think we are getting there. Still a little worried about some random bugs in the whole number generation... But I'd prefer having these tests to not having them for sure ;) test/hotspot/jtreg/compiler/floatingpoint/TestRoundFloatAll.java line 31: > 29: * @library /test/lib / > 30: * @modules java.base/jdk.internal.math > 31: * @run main/othervm -XX:-TieredCompilation -XX:CompileThresholdScaling=0.3 -XX:+PrintIdeal -XX:CompileCommand=compileonly,compiler.floatingpoint.TestRoundFloatAll::test* -XX:-UseSuperWord compiler.floatingpoint.TestRoundFloatAll please break up the line for easier reading test/hotspot/jtreg/compiler/floatingpoint/TestRoundFloatAll.java line 75: > 73: return (int) a; > 74: } > 75: } At first, I was worried about the indentation, then realized the original code had the strange indentation. Would there be a way to put this method in a shared file, so that you do not need to paste it everywhere? test/hotspot/jtreg/compiler/vectorization/TestRoundVectorFloatAll.java line 34: > 32: * @run main/othervm -XX:+PrintIdeal -XX:-TieredCompilation -XX:CompileThresholdScaling=0.3 -XX:MaxVectorSize=8 -XX:+UseSuperWord -XX:CompileCommand=compileonly,compiler.vectorization.TestRoundVectorFloatAll::test* compiler.vectorization.TestRoundVectorFloatAll > 33: * @run main/othervm -XX:+PrintIdeal -XX:-TieredCompilation -XX:CompileThresholdScaling=0.3 -XX:MaxVectorSize=16 -XX:+UseSuperWord -XX:CompileCommand=compileonly,compiler.vectorization.TestRoundVectorFloatAll::test* compiler.vectorization.TestRoundVectorFloatAll > 34: * @run main/othervm -XX:+PrintIdeal -XX:-TieredCompilation -XX:CompileThresholdScaling=0.3 -XX:MaxVectorSize=32 -XX:+UseSuperWord -XX:CompileCommand=compileonly,compiler.vectorization.TestRoundVectorFloatAll::test* compiler.vectorization.TestRoundVectorFloatAll Please check which flags you actually need here.... test/hotspot/jtreg/compiler/vectorization/TestRoundVectorFloatAll.java line 43: > 41: public class TestRoundVectorFloatAll { > 42: private static final int ITERS = 11000; > 43: private static final int ARRLEN = 997; Could you randomize this value ever so slightly? That way, the boundaries of the array are at different places. I think also that the size should be a little larger, just to ensure that we get maximum vector lengths. test/hotspot/jtreg/compiler/vectorization/TestRoundVectorFloatRandom.java line 202: > 200: } > 201: > 202: // test cases for NaN, Inf, subnormal, and so on just for completeness: +0.0 and -0.0 ------------- PR Review: https://git.openjdk.org/jdk/pull/17753#pullrequestreview-2043182218 PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1592477207 PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1592487797 PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1592499343 PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1592508616 PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1592481581 From epeter at openjdk.org Tue May 7 13:46:57 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 May 2024 13:46:57 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v13] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 13:23:48 GMT, Emanuel Peter wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> fix issues; modify vm options to make sure test the expected behaviors. > > test/hotspot/jtreg/compiler/floatingpoint/TestRoundFloatAll.java line 31: > >> 29: * @library /test/lib / >> 30: * @modules java.base/jdk.internal.math >> 31: * @run main/othervm -XX:-TieredCompilation -XX:CompileThresholdScaling=0.3 -XX:+PrintIdeal -XX:CompileCommand=compileonly,compiler.floatingpoint.TestRoundFloatAll::test* -XX:-UseSuperWord compiler.floatingpoint.TestRoundFloatAll > > please break up the line for easier reading Why these flags: `-XX:-TieredCompilation -XX:CompileThresholdScaling=0.3 -XX:+PrintIdeal -XX:-UseSuperWord` ? I also suggest that you use `-Xbatch`, just to make sure we have compiled all relevant methods after the warmup. If things get too slow, then maybe you want to consider using explicit compile exclusion / forbidding inlining for the `test*` method, rather than the compileonly, which prevents everything else from compiling. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1592498081 From snazarki at openjdk.org Tue May 7 14:02:00 2024 From: snazarki at openjdk.org (Sergey Nazarkin) Date: Tue, 7 May 2024 14:02:00 GMT Subject: RFR: 8330806: test/hotspot/jtreg/compiler/c1/TestLargeMonitorOffset.java fails on ARM32 In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 14:21:09 GMT, Aleksei Voitylov wrote: > TestLargeMonitorOffset was introduced by 8310844 with a fix for the AArch64 platform. The same issue needs to be fixed for ARM32. With this change, we add the large slot_offset handling to the ARM32 version of IR_Assembler::osr_entry(). > > Testing: jtreg hotspot, jtreg jdk tier1-3. Marked as reviewed by snazarki (no project role). ------------- PR Review: https://git.openjdk.org/jdk/pull/18891#pullrequestreview-2043280405 From chagedorn at openjdk.org Tue May 7 14:44:54 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 7 May 2024 14:44:54 GMT Subject: RFR: 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode [v2] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 14:24:27 GMT, Christian Hagedorn wrote: >> This patch replaces the `Opaque4Node` of the `If` for Initialized Assertion Predicates with a new `OpaqueInitializedAsseritonPredicateNode`. This helps to simplify pattern matching for predicate code and to distinguish from the two other uses of `Opaque4` nodes: >> 1. Template Assertion Predicate: The goal is to get rid of its `Opaque4Node` as well by using a dedicated `TemplateAssertionPredicateNode` for the `IfNode`. >> 2. Non-null-checks with instrinsics and unsafe accesses: This will eventually be the only use left. Once we get there, we should rename the node accordingly to `OpaqueNonNullCheck` or something like that. >> >> I went through all the uses of `Opaque4` nodes and did the following: >> - Could the `Opaque4` node be part of an Initialized Assertion Predicate? >> - No: Added an assert that we are not dealing with an Initialized Assertion Predicate. >> - Yes: >> - Yes **and only** for Initialized Assertion Predicates? Added an assert that we are only expecting an `OpaqueInitializedAsseritonPredicateNode` if appropriate. >> - Yes but could also be something else: Added case for `OpaqueInitializedAsseritonPredicateNode` next to the `Opaque4` case. >> - Is this `Opaque4` node only used for Template Assertion Predicates? >> - Yes: Added assert with call to `assertion_predicate_has_loop_opaque_node()` to check that we find its `OpaqueLoop*Nodes`. >> - I've added test cases where I was not sure about whether an `Opaque4` node could be part of a Template, an Initialized Assertion Predicate or a non-null-check. This was a little tricky but I think it was still worth to prevent future bugs (even though most of these special cases are quite rare). >> >> This is another patch split off from the full fix for Assertion Predicates. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into JDK-8330386 > - Add more comments and asserts > - Add more tests > - 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode Thanks Vladimir for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18951#issuecomment-2098573489 From dfenacci at openjdk.org Tue May 7 14:55:32 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 7 May 2024 14:55:32 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v7] In-Reply-To: References: Message-ID: > # Issue > When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. > > # Causes > On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. > The same is true for `StoreVector`s. > When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 > > where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor offsets interfere. > Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > but we don?t make sure that there are no masks or offsets. > A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and offsets here too but in this case we can include these cases if the masks and offsets of the vector stores are equivalent. > > # Solution > To avoid folding `Load`- and `StoreVector`s with masks and offsets we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns its own (to avoid ever being the same as a store node). In this way, the checks in `MemNode::can_see_stored_value` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 > > and `StoreNode::Identity` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > will fail if masks or offsets are used. > For 2 stores of the same value we instead check for mask and offset equality. > > Regression tests for all versions of `Load/StoreVectorGather/Masked` have been add... Damon Fenacci has updated the pull request incrementally with four additional commits since the last revision: - Update src/hotspot/share/opto/memnode.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/memnode.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/memnode.cpp Co-authored-by: Emanuel Peter - JDK-8325520: remove leftover comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18347/files - new: https://git.openjdk.org/jdk/pull/18347/files/85bb4bef..72bf6ca3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=05-06 Stats: 5 lines in 1 file changed: 1 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18347.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18347/head:pull/18347 PR: https://git.openjdk.org/jdk/pull/18347 From dfenacci at openjdk.org Tue May 7 14:55:32 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 7 May 2024 14:55:32 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v6] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 12:21:30 GMT, Emanuel Peter wrote: >> Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: >> >> - JDK-8325520: add store/load masked vector tests >> - JDK-8325520: add store/load tests with duplicate offsets > > src/hotspot/share/opto/memnode.cpp line 1169: > >> 1167: // LoadVector/StoreVector need additional checks >> 1168: if (st->is_StoreVector()) { >> 1169: // Ensure that types match > > To reduce noise, you could revert these comment changes, up to you. Right, it was a leftover. Removed. > src/hotspot/share/opto/memnode.cpp line 3551: > >> 3549: // Regular store (no offsets or mask) >> 3550: } else { >> 3551: result = mem; > > Suggestion: > > assert(Opcode() = Op_StoreVector, "just a plain vector store, no offset or mask"); > result = mem; > > Turning comments into asserts is preferable I would say. Good idea! Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1592624106 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1592629045 From dfenacci at openjdk.org Tue May 7 15:31:56 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 7 May 2024 15:31:56 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v6] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 12:42:38 GMT, Emanuel Peter wrote: >> Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: >> >> - JDK-8325520: add store/load masked vector tests >> - JDK-8325520: add store/load tests with duplicate offsets > > src/hotspot/share/opto/memnode.cpp line 3554: > >> 3552: } >> 3553: } >> 3554: } > > I think the code is now correct. > But I find the nested if-elseif-elseif-else ... structure a bit hard to read. And there is quite some code duplication (e.g. `result = mem` and all the `eqv_uncast` checks). > > You could either do something like this: > > if (!is_StoreVector() || > as_StoreVector()->has_same_vect_type_and_offsets_and_mask(mem->as_StoreVector())) { > result = mem; > } > > > Sketch: > > has_same_vect_type_and_offsets_and_mask: > > different vect_type -> return false > ... > > > Or maybe it would be better to define virtual functions to get the `mask` and `offsets` from a `StoreVector`? If it has none, just return `nullptr`. Sometimes people worry about virtual methods, but we already use them extensively for the node Value/Ideal anyway. > > Then, you can do: > > if (!is_StoreVector()) { > result = mem; > } else { > const Node* offsets1 = as_StoreVector()->get_offsets(); > const Node* offsets2 = mem->as_StoreVector()->get_offsets(); > const Node* mask1 = as_StoreVector()->get_mask(); > const Node* mask2 = mem->as_StoreVector()->get_mask(); > if (offsets1->eqv_uncast(offsets2) && offsets1->eqv_uncast(offsets2)) { > result = mem; > } > } > > I think that would be the cleanest and most readable way. > > What do you think? I agree that it is quite convoluted probably also because I've put `if (!is_StoreVector())` (which is redundant) at the beginning to get the most common case out of the way but still... At first I thought that multiple inheritance would be a good solution (masks and offsets could be inherited by the corresponding nodes) but the "HotSpot Coding Style" clearly says to avoid it... So, I think in the end your second suggestion is the cleanest. Changing it... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1592686165 From bkilambi at openjdk.org Tue May 7 15:36:57 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 7 May 2024 15:36:57 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v8] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 13:21:35 GMT, Emanuel Peter wrote: > Sorry for the delay, I'm really excited about this one, just had to get some more critical things done recently ;) Thanks for the review. I will update my patch soon with your suggestions. Apologies for not making changes in the test* directory regarding the UnorderedReduction node which is now deleted but some tests seem to exist. > src/hotspot/cpu/aarch64/aarch64_vector.ad line 2907: > >> 2905: format %{ "reduce_addF_sve $dst_src1, $dst_src1, $src2" %} >> 2906: ins_encode %{ >> 2907: assert(UseSVE > 0, "must be sve"); > > Is there no way we would now run into this assert? > > static bool use_neon_for_vector(int vector_length_in_bytes) { > return vector_length_in_bytes <= 16; > } > > Does `vector_length_in_bytes > 16` imply that we have `UseSVE > 0`? Yes, if `vector_length_in_bytes > 16`, it does imply `UseSVE > 0` as we do not have machines which have vector length > 16 with only Neon (or UseSVE == 0). ------------- PR Comment: https://git.openjdk.org/jdk/pull/18034#issuecomment-2098737087 PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1592690587 From roland at openjdk.org Tue May 7 15:43:56 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 7 May 2024 15:43:56 GMT Subject: RFR: 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode [v2] In-Reply-To: References: Message-ID: <_8csQpQVHlNpwenIT4H7OFkMSOaU6Fz-ZmJ0Yi6ArLU=.0b84b78d-4637-49ab-b43f-4c457498b0ce@github.com> On Mon, 6 May 2024 14:24:27 GMT, Christian Hagedorn wrote: >> This patch replaces the `Opaque4Node` of the `If` for Initialized Assertion Predicates with a new `OpaqueInitializedAsseritonPredicateNode`. This helps to simplify pattern matching for predicate code and to distinguish from the two other uses of `Opaque4` nodes: >> 1. Template Assertion Predicate: The goal is to get rid of its `Opaque4Node` as well by using a dedicated `TemplateAssertionPredicateNode` for the `IfNode`. >> 2. Non-null-checks with instrinsics and unsafe accesses: This will eventually be the only use left. Once we get there, we should rename the node accordingly to `OpaqueNonNullCheck` or something like that. >> >> I went through all the uses of `Opaque4` nodes and did the following: >> - Could the `Opaque4` node be part of an Initialized Assertion Predicate? >> - No: Added an assert that we are not dealing with an Initialized Assertion Predicate. >> - Yes: >> - Yes **and only** for Initialized Assertion Predicates? Added an assert that we are only expecting an `OpaqueInitializedAsseritonPredicateNode` if appropriate. >> - Yes but could also be something else: Added case for `OpaqueInitializedAsseritonPredicateNode` next to the `Opaque4` case. >> - Is this `Opaque4` node only used for Template Assertion Predicates? >> - Yes: Added assert with call to `assertion_predicate_has_loop_opaque_node()` to check that we find its `OpaqueLoop*Nodes`. >> - I've added test cases where I was not sure about whether an `Opaque4` node could be part of a Template, an Initialized Assertion Predicate or a non-null-check. This was a little tricky but I think it was still worth to prevent future bugs (even though most of these special cases are quite rare). >> >> This is another patch split off from the full fix for Assertion Predicates. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into JDK-8330386 > - Add more comments and asserts > - Add more tests > - 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode src/hotspot/share/opto/opaquenode.cpp line 110: > 108: } > 109: > 110: Node* OpaqueInitializedAssertionPredicateNode::Identity(PhaseGVN* phase) { Opaque4 is removed by macro expansion, right? But the new one is removed after loop opts.. So there's a change in behavior. What's the rationale for making that change? src/hotspot/share/opto/opaquenode.hpp line 138: > 136: // to true. Therefore, we get rid of them in product builds as they are useless. In debug builds we keep them as > 137: // additional verification code (i.e. removing this node and use the BoolNode input instead). > 138: class OpaqueInitializedAssertionPredicateNode : public Node { Shouldn't the new OpaqueInitializedAssertionPredicateNode be a subclass of Opaque4 or shouldn't both be a subclass of a common super type? Don't they share at least some logic or behavior? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18951#discussion_r1592701811 PR Review Comment: https://git.openjdk.org/jdk/pull/18951#discussion_r1592702908 From duke at openjdk.org Tue May 7 15:52:57 2024 From: duke at openjdk.org (Daniel Skantz) Date: Tue, 7 May 2024 15:52:57 GMT Subject: Integrated: 8330016: Stress seed should be initialized for runtime stub compilation In-Reply-To: <-Pj7vK6Wpgv3FnP2KZXB2so2wWraG3LWEKds9XDI3pY=.d6595586-aa83-47bc-84ed-ff8c4a1a5550@github.com> References: <-Pj7vK6Wpgv3FnP2KZXB2so2wWraG3LWEKds9XDI3pY=.d6595586-aa83-47bc-84ed-ff8c4a1a5550@github.com> Message-ID: On Mon, 6 May 2024 06:31:47 GMT, Daniel Skantz wrote: > We can initialize the stress seed for runtime stub compilation as we already do for method compilation. This found the bug described in JDK-8329258. It would apply if StressGCM or StressLCM vm flags are set. > > Testing: T1-5 default options. T1-5 with -XX:+StressLCM and -XX:+StressGCM. Manually tested that the stress seed is set and printed to compilation log if either stress option is set. This pull request has now been integrated. Changeset: 95d2f807 Author: Daniel Skantz Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/95d2f8072e91e8df80e49e341f4fdb4464a2616e Stats: 31 lines in 2 files changed: 20 ins; 10 del; 1 mod 8330016: Stress seed should be initialized for runtime stub compilation Reviewed-by: rcastanedalo, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/19095 From szaldana at openjdk.org Tue May 7 16:02:57 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Tue, 7 May 2024 16:02:57 GMT Subject: Integrated: 8319957: PhaseOutput::code_size is unused and should be removed In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 17:31:45 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR removes the unused ```PhaseOutput::code_size / method_size```. > > These were moved over from ```src/hotspot/share/opto/compile.hpp``` in the refactor from [8240363](https://bugs.openjdk.org/browse/JDK-8240363). Here's the git link for reference https://github.com/openjdk/jdk/commit/21cd75cb98f658639df14632680e9c5e58f11faa. > > I also checked whether there were any usages prior to the refactor and couldn?t find anything so I think it?s safe to remove it. > > Thanks, > Sonia This pull request has now been integrated. Changeset: 524aaad9 Author: Sonia Zaldana Calles Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/524aaad98317b1a50453e5a9a44922f481fb3b1e Stats: 3 lines in 2 files changed: 0 ins; 3 del; 0 mod 8319957: PhaseOutput::code_size is unused and should be removed Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/18981 From aboldtch at openjdk.org Tue May 7 16:18:00 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 7 May 2024 16:18:00 GMT Subject: RFR: 8331863: DUIterator_Fast used before it is constructed Message-ID: <5FEXRCspKpxNj3FJW1_2fqvdzuC40gTT8-SAG_pEflU=.69f4b39b-b7ed-4e45-87ac-2245ec75c789@github.com> `SimpleDUIterator` constructs two `DUIterator_Fast` but passes a reference to the second when constructing the first. In debug values are read from this not yet constructed object. Found when building a debug build with UBSAN /src/hotspot/share/opto/node.cpp:124:8: runtime error: load of value 200, which is not a valid value for type 'bool' #0 0x14619f4e6476 in DUIterator_Common::reset(DUIterator_Common const&) /src/hotspot/share/opto/node.cpp:124 #1 0x1461a32556a5 in DUIterator_Fast::operator=(DUIterator_Fast const&) /src/hotspot/share/opto/node.hpp:1486 #2 0x1461a32556a5 in Node::fast_outs(DUIterator_Fast&) const /src/hotspot/share/opto/node.hpp:1491 #3 0x1461a32556a5 in SimpleDUIterator::SimpleDUIterator(Node*) /src/hotspot/share/opto/node.hpp:1575 #4 0x1461a32556a5 in G1BarrierSetC2::has_cas_in_use_chain(Node*) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:855 #5 0x1461a3256cf1 in G1BarrierSetC2::verify_pre_load(Node*, Unique_Node_List&) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:881 #6 0x1461a325eec3 in G1BarrierSetC2::verify_gc_barriers(Compile*, BarrierSetC2::CompilePhase) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:1019 #7 0x1461a325eec3 in G1BarrierSetC2::verify_gc_barriers(Compile*, BarrierSetC2::CompilePhase) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:963 #8 0x1461a23160ed in Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*) /src/hotspot/share/opto/compile.cpp:875 #9 0x1461a1845fd0 in C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) /src/hotspot/share/opto/c2compiler.cpp:142 #10 0x1461a235ac39 in CompileBroker::invoke_compiler_on_method(CompileTask*) /src/hotspot/share/compiler/compileBroker.cpp:2305 #11 0x1461a235ee4e in CompileBroker::compiler_thread_loop() /src/hotspot/share/compiler/compileBroker.cpp:1963 #12 0x1461a4076f8d in JavaThread::thread_main_inner() /src/hotspot/share/runtime/javaThread.cpp:760 #13 0x1461a409da23 in JavaThread::run() /src/hotspot/share/runtime/javaThread.cpp:745 #14 0x1461a7b6d2bc in Thread::call_run() /src/hotspot/share/runtime/thread.cpp:221 #15 0x1461a62a8105 in thread_native_entry /src/hotspot/os/linux/os_linux.cpp:846 #16 0x1461c29801d9 in start_thread (/lib64/libpthread.so.0+0x81d9) #17 0x1461c18cae72 in __clone (/lib64/libc.so.6+0x39e72) ------------- Commit messages: - 8331863: DUIterator_Fast used before it is constructed Changes: https://git.openjdk.org/jdk/pull/19125/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19125&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331863 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19125.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19125/head:pull/19125 PR: https://git.openjdk.org/jdk/pull/19125 From kvn at openjdk.org Tue May 7 16:21:01 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 7 May 2024 16:21:01 GMT Subject: RFR: 8331862: Remove split relocation info implementation Message-ID: [Split relocation info](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/relocInfo.hpp#L225) was used only for SPARC. Non of current OpenJDK platforms use it. Tested tier1. ------------- Commit messages: - 8331862: Remove split relocation info implementation Changes: https://git.openjdk.org/jdk/pull/19126/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19126&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331862 Stats: 114 lines in 11 files changed: 2 ins; 58 del; 54 mod Patch: https://git.openjdk.org/jdk/pull/19126.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19126/head:pull/19126 PR: https://git.openjdk.org/jdk/pull/19126 From dfenacci at openjdk.org Tue May 7 16:30:23 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 7 May 2024 16:30:23 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v8] In-Reply-To: References: Message-ID: <_x_fle5Uwlya8vU73BUtILG3LCIrWZ-_UapBTvmlv6Y=.c6580565-4682-4ca7-a902-e32df5161f68@github.com> > # Issue > When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. > > # Causes > On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. > The same is true for `StoreVector`s. > When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 > > where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor offsets interfere. > Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > but we don?t make sure that there are no masks or offsets. > A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and offsets here too but in this case we can include these cases if the masks and offsets of the vector stores are equivalent. > > # Solution > To avoid folding `Load`- and `StoreVector`s with masks and offsets we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns its own (to avoid ever being the same as a store node). In this way, the checks in `MemNode::can_see_stored_value` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 > > and `StoreNode::Identity` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > will fail if masks or offsets are used. > For 2 stores of the same value we instead check for mask and offset equality. > > Regression tests for all versions of `Load/StoreVectorGather/Masked` have been add... Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8325520: fix assert condition ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18347/files - new: https://git.openjdk.org/jdk/pull/18347/files/72bf6ca3..a2cb6a58 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18347.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18347/head:pull/18347 PR: https://git.openjdk.org/jdk/pull/18347 From kvn at openjdk.org Tue May 7 16:56:06 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 7 May 2024 16:56:06 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v18] In-Reply-To: References: Message-ID: <-Q8XJ3BT26WE6vPUNR7-_Wi7iw7QKTi9O5HsvdeGh4M=.e35dc82b-326f-4207-a3f3-bacfb20032f4@github.com> On Thu, 2 May 2024 14:54:17 GMT, Roland Westrelin wrote: >> This change implements C2 optimizations for calls to >> ScopedValue.get(). Indeed, in: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> `v2` can be replaced by `v1` and the second call to `get()` can be >> optimized out. That's true whatever is between the 2 calls unless a >> new mapping for `scopedValue` is created in between (when that happens >> no optimizations is performed for the method being compiled). Hoisting >> a `get()` call out of loop for a loop invariant `scopedValue` should >> also be legal in most cases. >> >> `ScopedValue.get()` is implemented in java code as a 2 step process. A >> cache is attached to the current thread object. If the `ScopedValue` >> object is in the cache then the result from `get()` is read from >> there. Otherwise a slow call is performed that also inserts the >> mapping in the cache. The cache itself is lazily allocated. One >> `ScopedValue` can be hashed to 2 different indexes in the cache. On a >> cache probe, both indexes are checked. As a consequence, the process >> of probing the cache is a multi step process (check if the cache is >> present, check first index, check second index if first index >> failed). If the cache is populated early on, then when the method that >> calls `ScopedValue.get()` is compiled, profile reports the slow path >> as never taken and only the read from the cache is compiled. >> >> To perform the optimizations, I added 3 new node types to C2: >> >> - the pair >> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for >> the cache probe >> >> - a cfg node ScopedValueGetResultNode to help locate the result of the >> `get()` call in the IR graph. >> >> In pseudo code, once the nodes are inserted, the code of a `get()` is: >> >> >> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) >> if (hits_in_the_cache) { >> res = ScopedValueGetLoadFromCache(hits_in_the_cache); >> } else { >> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex >> } >> res = ScopedValueGetResult(res) >> >> >> In the snippet: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> Replacing `v2` by `v1` is then done by starting from the >> `ScopedValueGetResult` node for the second `get()` and looking for a >> dominating `ScopedValueGetResult` for the same `ScopedValue` >> object. When one is found, it is used as a replacement. Eliminating >> the second `get()` call is achieved by making >> `ScopedValueGetHitsInCache` always successful if there's a dominating >> `Scoped... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > whitespaces I want to see performance numbers on x64 and aarch64 before starting looking on it. It would be nice to have data for all micros `test/micro/org/openjdk/bench/java/lang/ScopedValues*.java` Put results into JBS and post short summary here. You can compare by disable/enable new intrinsics. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2098897865 From duke at openjdk.org Tue May 7 16:58:09 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Tue, 7 May 2024 16:58:09 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v15] In-Reply-To: References: Message-ID: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: parameter and local renames, update comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/d93e9893..2a63a159 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=13-14 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From duke at openjdk.org Tue May 7 16:58:10 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Tue, 7 May 2024 16:58:10 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v14] In-Reply-To: References: <727FyZHyBbtRilYRtbP2E4dbZYqj9a-QgXAuicQ2iZQ=.01035706-6591-4df5-bf7d-d7a2f6209015@github.com> Message-ID: On Tue, 7 May 2024 05:51:22 GMT, Jatin Bhateja wrote: >> Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: >> >> revert unneeded legacy flag change for kmovwl(K,K) and kmovql(K,K) > > src/hotspot/cpu/x86/assembler_x86.cpp line 11754: > >> 11752: >> 11753: // This is a 4 byte encoding >> 11754: void Assembler::evex_prefix(bool vex_r, bool vex_b, bool vex_x, bool evex_r, bool evex_b, bool evex_v, > > Suggestion: > > void Assembler::evex_prefix(bool vex_r, bool vex_b, bool vex_x, bool evex_r, bool eevex_b, bool evex_v, Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 11766: > >> 11764: // P0: byte 2, initialized to RXBR`00mm >> 11765: // instead of not'd >> 11766: int byte2 = (vex_r ? VEX_R : 0) | (vex_x ? VEX_X : 0) | (vex_b ? VEX_B : 0) | (evex_r ? EVEX_Rb : 0); > > Comment at [L#11765 > ](https://github.com/openjdk/jdk/pull/18476/files#diff-e3576e9c22db89236cdb906f032ff00748ff6d1c21b05277d991d80af75daf3aL11686) > `// P0: byte 2, initialized to RXBR'00mm => // P0: byte 2, initialized to RXBR'0mmm` Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 11768: > >> 11766: int byte2 = (vex_r ? VEX_R : 0) | (vex_x ? VEX_X : 0) | (vex_b ? VEX_B : 0) | (evex_r ? EVEX_Rb : 0); >> 11767: byte2 = (~byte2) & 0xF0; >> 11768: byte2 |= evex_b ? EEVEX_B : 0; > > Suggestion: > > byte2 |= eevex_b ? EEVEX_B : 0; > > > This corresponds to B4 bit which is specific to EEVEX encoding. Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 11846: > >> 11844: } >> 11845: bool eevex_x = adr.index_needs_rex2(); >> 11846: bool evex_b = adr.base_needs_rex2(); > > Suggestion: > > bool eevex_b = adr.base_needs_rex2(); Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 11848: > >> 11846: bool evex_b = adr.base_needs_rex2(); >> 11847: attributes->set_is_evex_instruction(); >> 11848: evex_prefix(vex_r, vex_b, vex_x, evex_r, evex_b, evex_v, eevex_x, nds_enc, pre, opc); > > Suggestion: > > evex_prefix(vex_r, vex_b, vex_x, evex_r, eevex_b, evex_v, eevex_x, nds_enc, pre, opc); Thanks, done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1592799351 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1592800035 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1592799078 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1592799595 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1592799825 From epeter at openjdk.org Tue May 7 17:08:52 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 May 2024 17:08:52 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop In-Reply-To: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> Message-ID: <2lreoMy7UKtgM_m8RCU68rp3FFkoU8zj3ckuTKzXqf0=.dc02a0d4-2671-4c70-a470-a64f28e38f2d@github.com> On Fri, 3 May 2024 12:33:43 GMT, Roland Westrelin wrote: > In the test case: > > > long i; > for (; i > 0; i--) { > res += 42 / ((int) i); > > > The long counted loop phi has type `[1..100]`. As a consequence, the > `ConvL2I` also has type `[1..100]`. The `DivI` node that follows can't > fault: it is not guarded by a zero check and has no control set. > > The `ConvL2I` is split through phi and so is the `DiVI` node: > `PhaseIdealLoop::cannot_split_division()` returns true because the > value coming from the backedge into the `DivI` (when it is about to be > split thru phi) is the result of the `ConvL2I` which has type > `[1..100`] so is not zero as far as the compiler can tell. > > On the last iteration of the loop, i is 1. Because the DivI was split > thru Phi, it computes the value for the following iteration, so for i > = 0. This causes a crash when the compiled code runs. > > The same problem can't happen with an int counted loop because logic > in `PhaseIdealLoop::split_thru_phi()` prevents a `ConvI2L` from being > split thru phi. I propose to fix this the same way: in the test case, > it's not true that once the `ConvL2I` is split thru phi it keeps type > `[1..100]`. The fix is fairly conservative because it's base on the > existing logic for `ConvI2L`: we would want to not split a `ConvL2I` > only a counted loopd but. I suppose the same is true for the `ConvI2L` > and I thought it would be best to revisit both together. Looks reasonable. ------------ I guess the issue is that ConvL2I and ConvI2L are also type nodes, which can restrict their type, just like CastII nodes. And that restricting of the type is only true under a certain if-branch. But if the ConvI2L were not a type-node, then it would not restrict type, and you could simply push it through phis. Right? Why do we have type restriction mixed into ConvI2L? Could that not be separated out into a CastII / CastLL? Maybe we could generally separate ConvI2L, type restriction, and pinning? CastII also does multiple things, and it has hurt us many times in the past. Would this sort of maximal separation and specialization not be more "see of nodes" style? Anyway, this would be interesting to look into for a future RFE. test/hotspot/jtreg/compiler/splitif/TestLongCountedLoopConvL2I.java line 31: > 29: * -XX:+StressGCM -XX:StressSeed=92643864 TestLongCountedLoopConvL2I > 30: * @run main/othervm -XX:-BackgroundCompilation -XX:-TieredCompilation -XX:-UseOnStackReplacement > 31: * -XX:+StressGCM TestLongCountedLoopConvL2I Would it make sense to have a run that allows OSR? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19086#pullrequestreview-2043711442 PR Review Comment: https://git.openjdk.org/jdk/pull/19086#discussion_r1592792340 From galder at openjdk.org Tue May 7 17:14:23 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 7 May 2024 17:14:23 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v14] In-Reply-To: References: Message-ID: > Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. > > The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: > > > $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op > ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op > ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op > ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op > ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op > ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op > ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op > ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op > ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op > ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op > ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op > ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op > ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op > ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op > ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op > > > It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. > > I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. > > > $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > ... > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 > > > One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? > > Thanks @rwestrel for his help shaping this up :) Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: Assert type is not interface ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17667/files - new: https://git.openjdk.org/jdk/pull/17667/files/9376e9ec..306db745 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17667&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17667&range=12-13 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17667.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17667/head:pull/17667 PR: https://git.openjdk.org/jdk/pull/17667 From galder at openjdk.org Tue May 7 17:14:23 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 7 May 2024 17:14:23 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v13] In-Reply-To: <_x-OSownzQQZ8fmlsbvQ42MLf9BGZskECTNncOE0s4E=.8381a076-0cc4-4339-924f-fa22ca780573@github.com> References: <9Eoh8hOSSVvAtf9iVQ6hflQyceUtt4dpZdqm61zg5XI=.358a4d79-70d9-4b54-85d5-37c6817f0fae@github.com> <_x-OSownzQQZ8fmlsbvQ42MLf9BGZskECTNncOE0s4E=.8381a076-0cc4-4339-924f-fa22ca780573@github.com> Message-ID: On Thu, 2 May 2024 08:24:34 GMT, Dean Long wrote: >> Then, I think we should add an assert that `!type->as_instance_klass()->is_interface()` and also that it's not and array of interfaces (using `base_element_klass()`) > > An array of interfaces can be exact: > > new Interface[20].getClasss(); > > and it seems like it would be safe to allow this, so I think we only need one assert for `!type->as_instance_klass()->is_interface()` if we don't trust the result of exact_type(). @dean-long @rwestrel I've added the assert. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1592820103 From chagedorn at openjdk.org Tue May 7 17:28:57 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 7 May 2024 17:28:57 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop In-Reply-To: References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> Message-ID: On Mon, 6 May 2024 11:50:40 GMT, Roland Westrelin wrote: > Are we sure divisions are the only cause of bugs? Not 100% sure. But the only cases I've observed so far are with division/mod where they float above and end up being executed too early (the result is never actually observed, though). > that once pushed thru phi, the type of the ConvL2I is simply not correct and that's the root cause. Yes, that's my understanding, too. But since the `AddL` input into the loop iv phi contains zero, it raised the question if we could actually detect that and do our decision based on whether the input contains zero instead of simply disabling pushing `ConvL2I` (and `ConvI2L`) nodes through phis entirely. It also seems that it's only a problem with loop iv phis because we improve the iv type in such a way that some of the possible values of the backedge are excluded. So, maybe a first step could be to allow splitting the `Conv*` nodes through non-loop-iv phi nodes. However, there might also be other non-loop-iv phi problems I'm currently not aware of. Nevertheless, it might be worth to investigate further in a separate RFE. > I wonder if we could get other failures because of this: maybe a node becoming top because of the incorrect type or an out of bound array access. Could very well be. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19086#discussion_r1592835265 From mli at openjdk.org Tue May 7 17:32:19 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 7 May 2024 17:32:19 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v14] In-Reply-To: References: Message-ID: > HI, > Can you have a look at this patch adding some tests for Math.round instrinsics? > Thanks! > > ### FYI: > During the development of RoundVF/RoundF, we faced the issues which were only spotted by running test exhaustively against 32/64 bits range of int/long. > It's helpful to add these exhaustive tests in jdk for future possible usage, rather than build it everytime when needed. > Of course, we need to put it in `manual` mode, so it's not run when `-automatic` jtreg option is specified which I guess is the mode CI used, please correct me if I'm assume incorrectly. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: misc fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17753/files - new: https://git.openjdk.org/jdk/pull/17753/files/b5207436..7c2ef4fb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17753&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17753&range=12-13 Stats: 251 lines in 5 files changed: 107 ins; 131 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/17753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17753/head:pull/17753 PR: https://git.openjdk.org/jdk/pull/17753 From mli at openjdk.org Tue May 7 17:32:20 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 7 May 2024 17:32:20 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v13] In-Reply-To: References: Message-ID: <41rVYJ90K1TmX9w8v2eZxPcaxH0YL8D3wrzQiEd7mnU=.a1458bea-4570-40ae-b052-523c413d26bd@github.com> On Tue, 7 May 2024 13:36:55 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/floatingpoint/TestRoundFloatAll.java line 31: >> >>> 29: * @library /test/lib / >>> 30: * @modules java.base/jdk.internal.math >>> 31: * @run main/othervm -XX:-TieredCompilation -XX:CompileThresholdScaling=0.3 -XX:+PrintIdeal -XX:CompileCommand=compileonly,compiler.floatingpoint.TestRoundFloatAll::test* -XX:-UseSuperWord compiler.floatingpoint.TestRoundFloatAll >> >> please break up the line for easier reading > > Why these flags: > `-XX:-TieredCompilation -XX:CompileThresholdScaling=0.3 -XX:+PrintIdeal -XX:-UseSuperWord` ? > > I also suggest that you use `-Xbatch`, just to make sure we have compiled all relevant methods after the warmup. If things get too slow, then maybe you want to consider using explicit compile exclusion / forbidding inlining for the `test*` method, rather than the compileonly, which prevents everything else from compiling. Thanks for suggestion, added `-Xbatch`. removed `-XX:+PrintIdeal`. keep `-XX:-UseSuperWord`, as we are testing scalar version intrinsic in this test. `-XX:-TieredCompilation -XX:CompileThresholdScaling=0.3` are just from previous tests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1592837993 From mli at openjdk.org Tue May 7 17:32:21 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 7 May 2024 17:32:21 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v13] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 13:30:12 GMT, Emanuel Peter wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> fix issues; modify vm options to make sure test the expected behaviors. > > test/hotspot/jtreg/compiler/floatingpoint/TestRoundFloatAll.java line 75: > >> 73: return (int) a; >> 74: } >> 75: } > > At first, I was worried about the indentation, then realized the original code had the strange indentation. > Would there be a way to put this method in a shared file, so that you do not need to paste it everywhere? moved to a shared lib file. > test/hotspot/jtreg/compiler/vectorization/TestRoundVectorFloatAll.java line 34: > >> 32: * @run main/othervm -XX:+PrintIdeal -XX:-TieredCompilation -XX:CompileThresholdScaling=0.3 -XX:MaxVectorSize=8 -XX:+UseSuperWord -XX:CompileCommand=compileonly,compiler.vectorization.TestRoundVectorFloatAll::test* compiler.vectorization.TestRoundVectorFloatAll >> 33: * @run main/othervm -XX:+PrintIdeal -XX:-TieredCompilation -XX:CompileThresholdScaling=0.3 -XX:MaxVectorSize=16 -XX:+UseSuperWord -XX:CompileCommand=compileonly,compiler.vectorization.TestRoundVectorFloatAll::test* compiler.vectorization.TestRoundVectorFloatAll >> 34: * @run main/othervm -XX:+PrintIdeal -XX:-TieredCompilation -XX:CompileThresholdScaling=0.3 -XX:MaxVectorSize=32 -XX:+UseSuperWord -XX:CompileCommand=compileonly,compiler.vectorization.TestRoundVectorFloatAll::test* compiler.vectorization.TestRoundVectorFloatAll > > Please check which flags you actually need here.... removed `-XX:+PrintIdeal` others seems useful to me. > test/hotspot/jtreg/compiler/vectorization/TestRoundVectorFloatAll.java line 43: > >> 41: public class TestRoundVectorFloatAll { >> 42: private static final int ITERS = 11000; >> 43: private static final int ARRLEN = 997; > > Could you randomize this value ever so slightly? That way, the boundaries of the array are at different places. I think also that the size should be a little larger, just to ensure that we get maximum vector lengths. Make sense, done. > test/hotspot/jtreg/compiler/vectorization/TestRoundVectorFloatRandom.java line 202: > >> 200: } >> 201: >> 202: // test cases for NaN, Inf, subnormal, and so on > > just for completeness: +0.0 and -0.0 added ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1592838750 PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1592838951 PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1592839461 PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1592838230 From chagedorn at openjdk.org Tue May 7 17:32:59 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 7 May 2024 17:32:59 GMT Subject: RFR: 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode [v2] In-Reply-To: <_8csQpQVHlNpwenIT4H7OFkMSOaU6Fz-ZmJ0Yi6ArLU=.0b84b78d-4637-49ab-b43f-4c457498b0ce@github.com> References: <_8csQpQVHlNpwenIT4H7OFkMSOaU6Fz-ZmJ0Yi6ArLU=.0b84b78d-4637-49ab-b43f-4c457498b0ce@github.com> Message-ID: <7b3qt72dd5rV6nirPQILkqTMleDRMRYuXlKpqVVVpyo=.c2ed3889-cb43-4576-9d63-de133152b7fb@github.com> On Tue, 7 May 2024 15:40:40 GMT, Roland Westrelin wrote: >> Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8330386 >> - Add more comments and asserts >> - Add more tests >> - 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode > > src/hotspot/share/opto/opaquenode.cpp line 110: > >> 108: } >> 109: >> 110: Node* OpaqueInitializedAssertionPredicateNode::Identity(PhaseGVN* phase) { > > Opaque4 is removed by macro expansion, right? But the new one is removed after loop opts.. So there's a change in behavior. What's the rationale for making that change? That's correct. I've originally had these nodes as macro nodes as well. But concepttionally, we want to get these nodes to be removed and the Initialized Assertion Predicates folded once we know that we no longer split loops (i.e. in post loop IGVN). I think it's easier to register them for this post loop IGVN run since we don't really expand the nodes to anything - they are just removed during expansion. I'm not entirely sure though what the original reason was to go with a macro expansion removal instead of a post loop IGVN removal for `Opaque4` nodes. Do you remember? > src/hotspot/share/opto/opaquenode.hpp line 138: > >> 136: // to true. Therefore, we get rid of them in product builds as they are useless. In debug builds we keep them as >> 137: // additional verification code (i.e. removing this node and use the BoolNode input instead). >> 138: class OpaqueInitializedAssertionPredicateNode : public Node { > > Shouldn't the new OpaqueInitializedAssertionPredicateNode be a subclass of Opaque4 or shouldn't both be a subclass of a common super type? Don't they share at least some logic or behavior? I first thought about reusing this class in some way. But the second input is actually not needed. We could move forward and just remove the second input for `Opaque4` nodes (it's always a true constant). But I still wanted to have an easy way to have a distinguishable node from the other uses of the `Opaque4` nodes in non-null checks. Furthermore, I think sub classing the `Opaque4` class can be problematic when doing `is_Opaque4()` since we sometimes expect an `Opaque4` only and sometimes an `OpaqueInitializedAssertionPredicate` only and sometimes both are fine. I think it's cleaner to have two separate classes instead of sub classing each other. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18951#discussion_r1592838684 PR Review Comment: https://git.openjdk.org/jdk/pull/18951#discussion_r1592840333 From kxu at openjdk.org Tue May 7 17:33:29 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 7 May 2024 17:33:29 GMT Subject: RFR: 8327381: Refactor type-improving transformations in BoolNode::Ideal to BoolNode::Value [v9] In-Reply-To: References: Message-ID: > This PR resolves [JDK-8327381](https://bugs.openjdk.org/browse/JDK-8327381) > > Currently the transformations for expressions with patterns `((x & m) u<= m)` or `((m & x) u<= m)` to `true` is in `BoolNode::Ideal` function with a new constant node of value `1` created. However, this is technically a type-improving (reduction in range) transformation that's better suited in `BoolNode::Value` function. > > New unit test `test/hotspot/jtreg/compiler/c2/TestBoolNodeGvn.java` asserting on IR nodes and correctness of this transformation is added and passing. Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: - Merge branch 'master' into boolnode-refactor - refactor BoolNode::Value() and extract code to ::Value_cmpu_and_mask - update comments - fix indentation again - apply test only on x64, aarch64 and riscv64 - also renames the class name in @run - update test @run annotation - improve formatting, correct annotation and rename test class - Merge branch 'master' into boolnode-refactor - update the package name for tests - ... and 6 more: https://git.openjdk.org/jdk/compare/91beff36...278c436a ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18198/files - new: https://git.openjdk.org/jdk/pull/18198/files/53cf5b3b..278c436a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18198&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18198&range=07-08 Stats: 122406 lines in 3144 files changed: 56561 ins; 49745 del; 16100 mod Patch: https://git.openjdk.org/jdk/pull/18198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18198/head:pull/18198 PR: https://git.openjdk.org/jdk/pull/18198 From mli at openjdk.org Tue May 7 17:36:57 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 7 May 2024 17:36:57 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v13] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 13:44:06 GMT, Emanuel Peter wrote: > Thanks for the extra tests! > Thanks for reviewing. > Can you measure how much time each test now takes on your machine? > Only TestRoundVectorFloatAll.java took longer, but still in one minute, others run rather quicker than it. > I think we are getting there. Still a little worried about some random bugs in the whole number generation... But I'd prefer having these tests to not having them for sure ;) Agree! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17753#issuecomment-2098965761 From kvn at openjdk.org Tue May 7 17:47:54 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 7 May 2024 17:47:54 GMT Subject: RFR: 8331863: DUIterator_Fast used before it is constructed In-Reply-To: <5FEXRCspKpxNj3FJW1_2fqvdzuC40gTT8-SAG_pEflU=.69f4b39b-b7ed-4e45-87ac-2245ec75c789@github.com> References: <5FEXRCspKpxNj3FJW1_2fqvdzuC40gTT8-SAG_pEflU=.69f4b39b-b7ed-4e45-87ac-2245ec75c789@github.com> Message-ID: <0L378sPPSazRYrpx_kfV6172mLp4EMfCWxw65zYoIj0=.36c662de-9bc4-4e8a-9e5d-5f7ae76c7f0b@github.com> On Tue, 7 May 2024 16:13:38 GMT, Axel Boldt-Christmas wrote: > `SimpleDUIterator` constructs two `DUIterator_Fast` but passes a reference to the second when constructing the first. In debug values are read from this not yet constructed object. > > Found when building a debug build with UBSAN > > /src/hotspot/share/opto/node.cpp:124:8: runtime error: load of value 200, which is not a valid value for type 'bool' > #0 0x14619f4e6476 in DUIterator_Common::reset(DUIterator_Common const&) /src/hotspot/share/opto/node.cpp:124 > #1 0x1461a32556a5 in DUIterator_Fast::operator=(DUIterator_Fast const&) /src/hotspot/share/opto/node.hpp:1486 > #2 0x1461a32556a5 in Node::fast_outs(DUIterator_Fast&) const /src/hotspot/share/opto/node.hpp:1491 > #3 0x1461a32556a5 in SimpleDUIterator::SimpleDUIterator(Node*) /src/hotspot/share/opto/node.hpp:1575 > #4 0x1461a32556a5 in G1BarrierSetC2::has_cas_in_use_chain(Node*) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:855 > #5 0x1461a3256cf1 in G1BarrierSetC2::verify_pre_load(Node*, Unique_Node_List&) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:881 > #6 0x1461a325eec3 in G1BarrierSetC2::verify_gc_barriers(Compile*, BarrierSetC2::CompilePhase) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:1019 > #7 0x1461a325eec3 in G1BarrierSetC2::verify_gc_barriers(Compile*, BarrierSetC2::CompilePhase) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:963 > #8 0x1461a23160ed in Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*) /src/hotspot/share/opto/compile.cpp:875 > #9 0x1461a1845fd0 in C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) /src/hotspot/share/opto/c2compiler.cpp:142 > #10 0x1461a235ac39 in CompileBroker::invoke_compiler_on_method(CompileTask*) /src/hotspot/share/compiler/compileBroker.cpp:2305 > #11 0x1461a235ee4e in CompileBroker::compiler_thread_loop() /src/hotspot/share/compiler/compileBroker.cpp:1963 > #12 0x1461a4076f8d in JavaThread::thread_main_inner() /src/hotspot/share/runtime/javaThread.cpp:760 > #13 0x1461a409da23 in JavaThread::run() /src/hotspot/share/runtime/javaThread.cpp:745 > #14 0x1461a7b6d2bc in Thread::call_run() /src/hotspot/share/runtime/thread.cpp:221 > #15 0x1461a62a8105 in thread_native_entry /src/hotspot/os/linux/os_linux.cpp:846 > #16 0x1461c29801d9 in start_thread (/lib64/libpthread.so.0+0x81d9) > #17 0x1461c18cae72 in __clone (/lib64/libc.so.6+0x39e72) Good and trivial. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19125#pullrequestreview-2043815671 From kvn at openjdk.org Tue May 7 17:51:57 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 7 May 2024 17:51:57 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v11] In-Reply-To: <0anrYmEFTzUaEynG83xqh3DlAygkKXw9BTxO982PkR4=.7a8d0d3d-168e-47eb-8385-79d4a9c46df3@github.com> References: <0anrYmEFTzUaEynG83xqh3DlAygkKXw9BTxO982PkR4=.7a8d0d3d-168e-47eb-8385-79d4a9c46df3@github.com> Message-ID: On Tue, 7 May 2024 04:27:12 GMT, Thomas Stuefe wrote: >> See [1] for previous discussions. >> >> We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. >> >> The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. >> >> Examples: >> >> This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` >> >> This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` >> >> >> --- >> >> The patch: >> >> 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. >> 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. >> 3) Adapted and extended tests >> >> I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. >> >> >> Tested: >> >> - manually on Mac m1 (debug and release) >> - GHAs are running >> - but Oracle will do more testing before this goes in >> >> [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: > > - remove debug output > - Merge branch 'master' into compiler-default-limit > - fix compiler.c2.TestFindNode again > - merge master and fix conflicts > - Remove unused variable > - Remove accidental change to TestDeadPhiMergeMemLoop.java > - fix copyrights > - fix copyrights > - another fix > - fix accidental slip in of another test name > - ... and 9 more: https://git.openjdk.org/jdk/compare/f308e107...61dc5952 Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18969#pullrequestreview-2043822389 From shade at openjdk.org Tue May 7 17:53:52 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 7 May 2024 17:53:52 GMT Subject: RFR: 8331863: DUIterator_Fast used before it is constructed In-Reply-To: <5FEXRCspKpxNj3FJW1_2fqvdzuC40gTT8-SAG_pEflU=.69f4b39b-b7ed-4e45-87ac-2245ec75c789@github.com> References: <5FEXRCspKpxNj3FJW1_2fqvdzuC40gTT8-SAG_pEflU=.69f4b39b-b7ed-4e45-87ac-2245ec75c789@github.com> Message-ID: On Tue, 7 May 2024 16:13:38 GMT, Axel Boldt-Christmas wrote: > `SimpleDUIterator` constructs two `DUIterator_Fast` but passes a reference to the second when constructing the first. In debug values are read from this not yet constructed object. > > Found when building a debug build with UBSAN > > /src/hotspot/share/opto/node.cpp:124:8: runtime error: load of value 200, which is not a valid value for type 'bool' > #0 0x14619f4e6476 in DUIterator_Common::reset(DUIterator_Common const&) /src/hotspot/share/opto/node.cpp:124 > #1 0x1461a32556a5 in DUIterator_Fast::operator=(DUIterator_Fast const&) /src/hotspot/share/opto/node.hpp:1486 > #2 0x1461a32556a5 in Node::fast_outs(DUIterator_Fast&) const /src/hotspot/share/opto/node.hpp:1491 > #3 0x1461a32556a5 in SimpleDUIterator::SimpleDUIterator(Node*) /src/hotspot/share/opto/node.hpp:1575 > #4 0x1461a32556a5 in G1BarrierSetC2::has_cas_in_use_chain(Node*) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:855 > #5 0x1461a3256cf1 in G1BarrierSetC2::verify_pre_load(Node*, Unique_Node_List&) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:881 > #6 0x1461a325eec3 in G1BarrierSetC2::verify_gc_barriers(Compile*, BarrierSetC2::CompilePhase) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:1019 > #7 0x1461a325eec3 in G1BarrierSetC2::verify_gc_barriers(Compile*, BarrierSetC2::CompilePhase) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:963 > #8 0x1461a23160ed in Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*) /src/hotspot/share/opto/compile.cpp:875 > #9 0x1461a1845fd0 in C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) /src/hotspot/share/opto/c2compiler.cpp:142 > #10 0x1461a235ac39 in CompileBroker::invoke_compiler_on_method(CompileTask*) /src/hotspot/share/compiler/compileBroker.cpp:2305 > #11 0x1461a235ee4e in CompileBroker::compiler_thread_loop() /src/hotspot/share/compiler/compileBroker.cpp:1963 > #12 0x1461a4076f8d in JavaThread::thread_main_inner() /src/hotspot/share/runtime/javaThread.cpp:760 > #13 0x1461a409da23 in JavaThread::run() /src/hotspot/share/runtime/javaThread.cpp:745 > #14 0x1461a7b6d2bc in Thread::call_run() /src/hotspot/share/runtime/thread.cpp:221 > #15 0x1461a62a8105 in thread_native_entry /src/hotspot/os/linux/os_linux.cpp:846 > #16 0x1461c29801d9 in start_thread (/lib64/libpthread.so.0+0x81d9) > #17 0x1461c18cae72 in __clone (/lib64/libc.so.6+0x39e72) Ouch. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19125#pullrequestreview-2043825879 From kvn at openjdk.org Tue May 7 18:02:55 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 7 May 2024 18:02:55 GMT Subject: RFR: 8331764: C2 SuperWord: refactor _align_to_ref/_mem_ref_for_main_loop_alignment In-Reply-To: References: Message-ID: <0MHAO65YiEDNeD0RXunmaHh2sg14Czg16r19fxPm7Os=.77c88012-9d29-4857-8505-4bd6f8516dc4@github.com> On Tue, 7 May 2024 09:26:11 GMT, Emanuel Peter wrote: > This PR accomplishes these things: > - Rename `_align_to_ref` -> `_mem_ref_for_main_loop_alignment`. > - Move the `mem_ref` finding for alignment out of `SuperWord::find_adjacent_refs`. This is too early, and we don't even know if the relevant `mem_ref` is going to be vectorized. It makes more sense to pick a `mem_ref` directly in `SuperWord::adjust_pre_loop_limit_to_align_main_loop_vectors`, where we already know what packs are going to be vectorized. > - For the alignment width (aw), we can use the `vector_width` of the pack to which the `mem_ref` belongs, rather than the potentially much larger `vector_width_in_bytes`. I track this with `_aw_for_main_loop_alignment` now. > > I need this for https://github.com/openjdk/jdk/pull/18822, and decided to split it out into an independent change. src/hotspot/share/opto/superword.cpp line 3407: > 3405: if (first == nullptr) { continue; } > 3406: > 3407: int vw = first->memory_size() * pack->size(); I assume `first` is verified already and `first->memory_size()` is reasonable (size of primitive type). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19115#discussion_r1592872102 From sviswanathan at openjdk.org Tue May 7 18:24:03 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 7 May 2024 18:24:03 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Rearrange; add lambdas for clarity src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 383: > 381: { > 382: Label L_short; > 383: A comment here: // Broadcast the beginning of needle into a vector register. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 390: > 388: __ vpbroadcastb(byte_0, Address(needle, 0), Assembler::AVX_256bit); > 389: } > 390: A comment here: // Broadcast the end of needle into a vector register. This step is not needed for single element needle. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 418: > 416: __ cmpq(haystack_len, 0x10); > 417: __ ja_b(L_moreThan16); > 418: An assert here to check for header size >= 16 would be good. Also a comment here would he good, something like: // Copy 16 or 32 bytes prior to haystack end onto stack // This will possibly including some object header bytes when haystack length is less than 16 or 32 bytes // Set the new haystack address to beginning of copied haystack on stack adjusting for extra bytes copied src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 498: > 496: > 497: // big_case_loop_helper will fall through to this point if one or more potential matches are found > 498: // The mask will have a bitmask indicating the position of the potential matches within the haystack If no potential match, which label does the big_case_loop_helper jump to? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 517: > 515: __C2 arrays_equals(false, haystackStart, firstNeedleCompare, compLen, retval, rScratch, xmm_tmp3, xmm_tmp4, > 516: false /* char */, knoreg); > 517: __ testl(retval, retval); Since this is byte compare even for isU, the retval here could be a 64-bit quantity so the testl should be a testq. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 553: > 551: // Haystack always copied to stack, so 32-byte reads OK > 552: // Haystack length < 32 > 553: // 10 < needle length < 32 The comment below may need update as we come here for needle_len > OPT_NEEDLE_SIZE_MAX which is currently set as 5: // 10 < needle length < 32 src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 611: > 609: __C2 arrays_equals(false, rTmp, firstNeedleCompare, compLen, rTmp3, rTmp2, xmm_tmp3, xmm_tmp4, false /* char */, > 610: knoreg); > 611: __ testl(rTmp3, rTmp3); Since this is byte compare even for isU, the rtmp3 here could be a 64-bit quantity so the testl should be a testq. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 629: > 627: > 628: __ bind(L_returnError); > 629: __ movq(rbp, -1); This could directly be rax instead of intermediate rbp and then moving from rbp to rax. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 633: > 631: > 632: __ bind(L_returnZero); > 633: __ xorl(rbp, rbp); This could directly be rax instead of intermediate rbp and then moving from rbp to rax. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592791718 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592792401 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592774634 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592866631 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592868501 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592880650 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592885514 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592892211 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592892329 From sgibbons at openjdk.org Tue May 7 19:03:28 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 7 May 2024 19:03:28 GMT Subject: RFR: 8331033: EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 [v2] In-Reply-To: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> References: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> Message-ID: > Added a strcmp for unsafe_setmemory in process_call_arguments() so the assert would not trigger. > > I believe this is the correct fix as I do not think the arguments for setMemory need special handling like arraycopy. > > I would like suggestions on how to generate a testcase to catch this type of error in mainline. Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Add test for setMemory escape ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19032/files - new: https://git.openjdk.org/jdk/pull/19032/files/d6702fc3..e938e57c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19032&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19032&range=00-01 Stats: 114 lines in 1 file changed: 114 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19032.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19032/head:pull/19032 PR: https://git.openjdk.org/jdk/pull/19032 From sgibbons at openjdk.org Tue May 7 19:06:53 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 7 May 2024 19:06:53 GMT Subject: RFR: 8331033: EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 [v2] In-Reply-To: References: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> Message-ID: On Tue, 7 May 2024 19:03:28 GMT, Scott Gibbons wrote: >> Added a strcmp for unsafe_setmemory in process_call_arguments() so the assert would not trigger. >> >> I believe this is the correct fix as I do not think the arguments for setMemory need special handling like arraycopy. >> >> I would like suggestions on how to generate a testcase to catch this type of error in mainline. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Add test for setMemory escape Added testcase. Thanks @jatin-bhateja for help with the testcase. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19032#issuecomment-2099114433 From sviswanathan at openjdk.org Tue May 7 20:40:59 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 7 May 2024 20:40:59 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Rearrange; add lambdas for clarity src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 576: > 574: broadcast_additional_needles(false, 0 /* unknown */, NUMBER_OF_NEEDLE_BYTES_TO_COMPARE, needle, needleLen, rTmp3, > 575: isUU, isUL, _masm); > 576: Good to pass output xmm registers to this method. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 587: > 585: // firstNeedleCompare has address of second element of needle > 586: // compLen has length of comparison to do > 587: This is not clear. firstNeedleCompare gets needle + NUMBER_OF_NEEDLE_BYTES_TO_COMPARE - 1 which is not necessarily the second element of needle. If it helps let us fix the NUMBER_OF_NEEDLE_BYTES_TO_COMPARE to 3 and have comments and code versus that only. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 590: > 588: compare_haystack_to_needle(false, 0, NUMBER_OF_NEEDLE_BYTES_TO_COMPARE, L_returnRBP, haystack, isU, > 589: DO_EARLY_BAILOUT, mask, needleLen, rTmp3, _masm); > 590: It is better to pass the broadcasted xmm registers to compare_haystack_to_nedle. Basically pass input, output, and temps to all the methods. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 639: > 637: __ movl(rax, r8); > 638: __ subq(rcx, rbx); > 639: __ addq(rcx, rax); This could be: __ subq(rcx, rbx); __ addq(rcx, r8); src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 647: > 645: __ cmpq(r11, r10); > 646: __ movq(rbp, -1); > 647: __ cmovq(Assembler::belowEqual, rbp, r11); This could be directly computed in rax: __ movq(rax, -1); __ cmovq(Assembler::belowEqual, rax, r11); Also is it possible to not do cmov on some paths? It is an expensive operation. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1010: > 1008: static void broadcast_additional_needles(bool sizeKnown, int size, int bytesToCompare, Register needle, > 1009: Register needleLen, Register rTmp, bool isUU, bool isUL, > 1010: MacroAssembler *_masm) { Good to add output XMM registers to the parameter list. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1040: > 1038: __ vpbroadcastb(byte_1, Address(needle, 1), Assembler::AVX_256bit); > 1039: } > 1040: } It will be good to have a function which broadcasts a needle element from a given offset into a vector register. That function could take (needle address, offset, outout vector register, temps). Such a function could then be called twice from here and from main function for offset 0. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1593046499 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1593057834 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1593045710 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592989197 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592992225 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1593023349 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1593006539 From kvn at openjdk.org Tue May 7 21:10:52 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 7 May 2024 21:10:52 GMT Subject: RFR: 8331033: EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 [v2] In-Reply-To: References: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> Message-ID: <48sfm7TOlk9i8A2_WhISeaR0ETfBgCUZGfHalnDJqFY=.600c053a-40af-4b62-bf6f-ae3c8755b8db@github.com> On Tue, 7 May 2024 19:03:28 GMT, Scott Gibbons wrote: >> Added a strcmp for unsafe_setmemory in process_call_arguments() so the assert would not trigger. >> >> I believe this is the correct fix as I do not think the arguments for setMemory need special handling like arraycopy. >> >> I would like suggestions on how to generate a testcase to catch this type of error in mainline. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Add test for setMemory escape Few comments about test. test/hotspot/jtreg/compiler/escapeAnalysis/Test8331033.java line 2: > 1: /* > 2: * Copyright (c) 2020, Red Hat, Inc. All rights reserved. Suggestion: * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. test/hotspot/jtreg/compiler/escapeAnalysis/Test8331033.java line 28: > 26: * @bug 8331033 > 27: * @summary EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 > 28: * Suggestion: * @requires vm.compMode != "Xint" test/hotspot/jtreg/compiler/escapeAnalysis/Test8331033.java line 29: > 27: * @summary EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 > 28: * > 29: * @run main/othervm -XX:+PrintEscapeAnalysis -Xbatch -XX:-TieredCompilation Test8331033 Suggestion: * @run main/othervm -Xbatch -XX:-TieredCompilation Test8331033 test/hotspot/jtreg/compiler/escapeAnalysis/Test8331033.java line 56: > 54: * // "Escape Analysis for Java", Proceedings of ACM SIGPLAN > 55: * // OOPSLA Conference, November 1, 1999 > 56: */ No need for this comment. We have it in HotSpot sources, in `opto/escape.hpp`. ------------- PR Review: https://git.openjdk.org/jdk/pull/19032#pullrequestreview-2044189508 PR Review Comment: https://git.openjdk.org/jdk/pull/19032#discussion_r1593090598 PR Review Comment: https://git.openjdk.org/jdk/pull/19032#discussion_r1593092676 PR Review Comment: https://git.openjdk.org/jdk/pull/19032#discussion_r1593093151 PR Review Comment: https://git.openjdk.org/jdk/pull/19032#discussion_r1593094736 From sgibbons at openjdk.org Tue May 7 21:17:23 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 7 May 2024 21:17:23 GMT Subject: RFR: 8331033: EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 [v3] In-Reply-To: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> References: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> Message-ID: > Added a strcmp for unsafe_setmemory in process_call_arguments() so the assert would not trigger. > > I believe this is the correct fix as I do not think the arguments for setMemory need special handling like arraycopy. > > I would like suggestions on how to generate a testcase to catch this type of error in mainline. Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Review comments - change copyright, add @requires, change @run ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19032/files - new: https://git.openjdk.org/jdk/pull/19032/files/e938e57c..6c1bedf1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19032&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19032&range=01-02 Stats: 12 lines in 1 file changed: 1 ins; 9 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19032.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19032/head:pull/19032 PR: https://git.openjdk.org/jdk/pull/19032 From sgibbons at openjdk.org Tue May 7 21:17:24 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 7 May 2024 21:17:24 GMT Subject: RFR: 8331033: EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 [v3] In-Reply-To: References: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> Message-ID: On Tue, 7 May 2024 21:13:47 GMT, Scott Gibbons wrote: >> Added a strcmp for unsafe_setmemory in process_call_arguments() so the assert would not trigger. >> >> I believe this is the correct fix as I do not think the arguments for setMemory need special handling like arraycopy. >> >> I would like suggestions on how to generate a testcase to catch this type of error in mainline. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Review comments - change copyright, add @requires, change @run Addressed @vnkozlov review comments. ------------- PR Review: https://git.openjdk.org/jdk/pull/19032#pullrequestreview-2044214623 From sgibbons at openjdk.org Tue May 7 21:17:24 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 7 May 2024 21:17:24 GMT Subject: RFR: 8331033: EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 [v2] In-Reply-To: <48sfm7TOlk9i8A2_WhISeaR0ETfBgCUZGfHalnDJqFY=.600c053a-40af-4b62-bf6f-ae3c8755b8db@github.com> References: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> <48sfm7TOlk9i8A2_WhISeaR0ETfBgCUZGfHalnDJqFY=.600c053a-40af-4b62-bf6f-ae3c8755b8db@github.com> Message-ID: On Tue, 7 May 2024 21:04:45 GMT, Vladimir Kozlov wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Add test for setMemory escape > > test/hotspot/jtreg/compiler/escapeAnalysis/Test8331033.java line 28: > >> 26: * @bug 8331033 >> 27: * @summary EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 >> 28: * > > Suggestion: > > * @requires vm.compMode != "Xint" Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19032#discussion_r1593100622 From sgibbons at openjdk.org Tue May 7 21:29:52 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 7 May 2024 21:29:52 GMT Subject: RFR: 8331033: EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 [v3] In-Reply-To: References: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> Message-ID: On Tue, 7 May 2024 21:17:23 GMT, Scott Gibbons wrote: >> Added a strcmp for unsafe_setmemory in process_call_arguments() so the assert would not trigger. >> >> I believe this is the correct fix as I do not think the arguments for setMemory need special handling like arraycopy. >> >> I would like suggestions on how to generate a testcase to catch this type of error in mainline. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Review comments - change copyright, add @requires, change @run Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19032#issuecomment-2099338730 From kvn at openjdk.org Tue May 7 21:29:51 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 7 May 2024 21:29:51 GMT Subject: RFR: 8331033: EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 [v3] In-Reply-To: References: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> Message-ID: On Tue, 7 May 2024 21:17:23 GMT, Scott Gibbons wrote: >> Added a strcmp for unsafe_setmemory in process_call_arguments() so the assert would not trigger. >> >> I believe this is the correct fix as I do not think the arguments for setMemory need special handling like arraycopy. >> >> I would like suggestions on how to generate a testcase to catch this type of error in mainline. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Review comments - change copyright, add @requires, change @run Good. I submitted testing to make sure the test passed with different flags combinations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19032#issuecomment-2099337542 From sviswanathan at openjdk.org Wed May 8 00:26:59 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 8 May 2024 00:26:59 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Rearrange; add lambdas for clarity src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1082: > 1080: // noMatch - label bound outside to jump to if there is no match > 1081: // haystack - the address of the first byte of the haystack > 1082: // hsLen - the sizeof the haystack Good to specify if the size (size of needle) and hsLen (size of haystack) is in bytes or elements. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1149: > 1147: > 1148: if (size == (isU ? 2 : 1)) { > 1149: __ vpmovmskb(eq_mask, cmp_0, Assembler::AVX_256bit); vpmovmskb is being done twice if doEarlyBailout is set to 1 (the setting we have currently). If it helps to simplify, we could assume that doEarlyBailout is always set to 1 and remove this configurability. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1174: > 1172: #define lastMask rTmp > 1173: __ vpmovmskb(lastMask, cmp_k, Assembler::AVX_256bit); > 1174: __ shrq(lastMask); did you mean to shift the lastMask by shiftVal here? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1185: > 1183: if (size > (isU ? 4 : 2)) { > 1184: if (doEarlyBailout) { > 1185: __ testl(eq_mask, eq_mask); The masks are 32 bit as we are comparing max 32 byes (256 bits) at a time. So we could consistently do either andl, testl, shrl or andq, testq, shrq. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1593225178 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1593225488 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1593227487 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1593229554 From galder at openjdk.org Wed May 8 04:34:18 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 8 May 2024 04:34:18 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v15] In-Reply-To: References: Message-ID: > Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. > > The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: > > > $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op > ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op > ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op > ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op > ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op > ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op > ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op > ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op > ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op > ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op > ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op > ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op > ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op > ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op > ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op > > > It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. > > I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. > > > $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > ... > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 > > > One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? > > Thanks @rwestrel for his help shaping this up :) Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: Fix assert to only have a single ! ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17667/files - new: https://git.openjdk.org/jdk/pull/17667/files/306db745..a35cdd84 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17667&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17667&range=13-14 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17667.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17667/head:pull/17667 PR: https://git.openjdk.org/jdk/pull/17667 From galder at openjdk.org Wed May 8 04:34:18 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 8 May 2024 04:34:18 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v13] In-Reply-To: References: <9Eoh8hOSSVvAtf9iVQ6hflQyceUtt4dpZdqm61zg5XI=.358a4d79-70d9-4b54-85d5-37c6817f0fae@github.com> <_x-OSownzQQZ8fmlsbvQ42MLf9BGZskECTNncOE0s4E=.8381a076-0cc4-4339-924f-fa22ca780573@github.com> Message-ID: On Tue, 7 May 2024 17:11:37 GMT, Galder Zamarre?o wrote: >> An array of interfaces can be exact: >> >> new Interface[20].getClasss(); >> >> and it seems like it would be safe to allow this, so I think we only need one assert for `!type->as_instance_klass()->is_interface()` if we don't trust the result of exact_type(). > > @dean-long @rwestrel I've added the assert. The assert doesn't hold, e.g. === Output from failing command(s) repeated here === * For target buildtools_create_symbols_javac__the.COMPILE_CREATE_SYMBOLS_batch: # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/runner/work/jdk/jdk/src/hotspot/share/c1/c1_GraphBuilder.cpp:2031), pid=75212, tid=75244 # Error: assert(!!type->as_instance_klass()->is_interface()) failed # # JRE version: OpenJDK Runtime Environment (23.0) (fastdebug build 23-internal-galderz-306db7459b1316251e36d0eccc3035d11db44889) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 23-internal-galderz-306db7459b1316251e36d0eccc3035d11db44889, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x777cd0] GraphBuilder::invoke(Bytecodes::Code)+0x1200 Thoughts @rwestrel @dean-long? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1593369537 From galder at openjdk.org Wed May 8 04:34:18 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 8 May 2024 04:34:18 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v13] In-Reply-To: References: <9Eoh8hOSSVvAtf9iVQ6hflQyceUtt4dpZdqm61zg5XI=.358a4d79-70d9-4b54-85d5-37c6817f0fae@github.com> <_x-OSownzQQZ8fmlsbvQ42MLf9BGZskECTNncOE0s4E=.8381a076-0cc4-4339-924f-fa22ca780573@github.com> Message-ID: On Wed, 8 May 2024 04:29:59 GMT, Galder Zamarre?o wrote: >> @dean-long @rwestrel I've added the assert. > > The assert doesn't hold, e.g. > > > === Output from failing command(s) repeated here === > * For target buildtools_create_symbols_javac__the.COMPILE_CREATE_SYMBOLS_batch: > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/home/runner/work/jdk/jdk/src/hotspot/share/c1/c1_GraphBuilder.cpp:2031), pid=75212, tid=75244 > # Error: assert(!!type->as_instance_klass()->is_interface()) failed > # > # JRE version: OpenJDK Runtime Environment (23.0) (fastdebug build 23-internal-galderz-306db7459b1316251e36d0eccc3035d11db44889) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 23-internal-galderz-306db7459b1316251e36d0eccc3035d11db44889, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0x777cd0] GraphBuilder::invoke(Bytecodes::Code)+0x1200 > > > Thoughts @rwestrel @dean-long? Hmmm, the double `!!`... let me fix that and see. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1593369973 From epeter at openjdk.org Wed May 8 04:40:57 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 May 2024 04:40:57 GMT Subject: RFR: 8331764: C2 SuperWord: refactor _align_to_ref/_mem_ref_for_main_loop_alignment In-Reply-To: <0MHAO65YiEDNeD0RXunmaHh2sg14Czg16r19fxPm7Os=.77c88012-9d29-4857-8505-4bd6f8516dc4@github.com> References: <0MHAO65YiEDNeD0RXunmaHh2sg14Czg16r19fxPm7Os=.77c88012-9d29-4857-8505-4bd6f8516dc4@github.com> Message-ID: On Tue, 7 May 2024 18:00:16 GMT, Vladimir Kozlov wrote: >> This PR accomplishes these things: >> - Rename `_align_to_ref` -> `_mem_ref_for_main_loop_alignment`. >> - Move the `mem_ref` finding for alignment out of `SuperWord::find_adjacent_refs`. This is too early, and we don't even know if the relevant `mem_ref` is going to be vectorized. It makes more sense to pick a `mem_ref` directly in `SuperWord::adjust_pre_loop_limit_to_align_main_loop_vectors`, where we already know what packs are going to be vectorized. >> - For the alignment width (aw), we can use the `vector_width` of the pack to which the `mem_ref` belongs, rather than the potentially much larger `vector_width_in_bytes`. I track this with `_aw_for_main_loop_alignment` now. >> >> I need this for https://github.com/openjdk/jdk/pull/18822, and decided to split it out into an independent change. > > src/hotspot/share/opto/superword.cpp line 3407: > >> 3405: if (first == nullptr) { continue; } >> 3406: >> 3407: int vw = first->memory_size() * pack->size(); > > I assume `first` is verified already and `first->memory_size()` is reasonable (size of primitive type). Yes, it is. All of this code is run in `SuperWord::output`, and at this point we are committed to vectorization - everything is verified. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19115#discussion_r1593373458 From epeter at openjdk.org Wed May 8 04:46:52 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 May 2024 04:46:52 GMT Subject: RFR: 8331764: C2 SuperWord: refactor _align_to_ref/_mem_ref_for_main_loop_alignment In-Reply-To: References: <0MHAO65YiEDNeD0RXunmaHh2sg14Czg16r19fxPm7Os=.77c88012-9d29-4857-8505-4bd6f8516dc4@github.com> Message-ID: On Wed, 8 May 2024 04:38:06 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/superword.cpp line 3407: >> >>> 3405: if (first == nullptr) { continue; } >>> 3406: >>> 3407: int vw = first->memory_size() * pack->size(); >> >> I assume `first` is verified already and `first->memory_size()` is reasonable (size of primitive type). > > Yes, it is. All of this code is run in `SuperWord::output`, and at this point we are committed to vectorization - everything is verified. That is what I tried to say in the PR description: > It makes more sense to pick a mem_ref directly in SuperWord::adjust_pre_loop_limit_to_align_main_loop_vectors, where we already know what packs are going to be vectorized. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19115#discussion_r1593376793 From epeter at openjdk.org Wed May 8 04:46:53 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 May 2024 04:46:53 GMT Subject: RFR: 8331764: C2 SuperWord: refactor _align_to_ref/_mem_ref_for_main_loop_alignment In-Reply-To: References: <0MHAO65YiEDNeD0RXunmaHh2sg14Czg16r19fxPm7Os=.77c88012-9d29-4857-8505-4bd6f8516dc4@github.com> Message-ID: On Wed, 8 May 2024 04:43:46 GMT, Emanuel Peter wrote: >> Yes, it is. All of this code is run in `SuperWord::output`, and at this point we are committed to vectorization - everything is verified. > > That is what I tried to say in the PR description: >> It makes more sense to pick a mem_ref directly in SuperWord::adjust_pre_loop_limit_to_align_main_loop_vectors, where we already know what packs are going to be vectorized. Yes, `first->memory_size()` knows the size in bytes of the load/store. It is used many places in SuperWord. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19115#discussion_r1593377294 From aboldtch at openjdk.org Wed May 8 05:05:58 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 8 May 2024 05:05:58 GMT Subject: RFR: 8331863: DUIterator_Fast used before it is constructed In-Reply-To: <5FEXRCspKpxNj3FJW1_2fqvdzuC40gTT8-SAG_pEflU=.69f4b39b-b7ed-4e45-87ac-2245ec75c789@github.com> References: <5FEXRCspKpxNj3FJW1_2fqvdzuC40gTT8-SAG_pEflU=.69f4b39b-b7ed-4e45-87ac-2245ec75c789@github.com> Message-ID: On Tue, 7 May 2024 16:13:38 GMT, Axel Boldt-Christmas wrote: > `SimpleDUIterator` constructs two `DUIterator_Fast` but passes a reference to the second when constructing the first. In debug values are read from this not yet constructed object. > > Found when building a debug build with UBSAN > > /src/hotspot/share/opto/node.cpp:124:8: runtime error: load of value 200, which is not a valid value for type 'bool' > #0 0x14619f4e6476 in DUIterator_Common::reset(DUIterator_Common const&) /src/hotspot/share/opto/node.cpp:124 > #1 0x1461a32556a5 in DUIterator_Fast::operator=(DUIterator_Fast const&) /src/hotspot/share/opto/node.hpp:1486 > #2 0x1461a32556a5 in Node::fast_outs(DUIterator_Fast&) const /src/hotspot/share/opto/node.hpp:1491 > #3 0x1461a32556a5 in SimpleDUIterator::SimpleDUIterator(Node*) /src/hotspot/share/opto/node.hpp:1575 > #4 0x1461a32556a5 in G1BarrierSetC2::has_cas_in_use_chain(Node*) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:855 > #5 0x1461a3256cf1 in G1BarrierSetC2::verify_pre_load(Node*, Unique_Node_List&) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:881 > #6 0x1461a325eec3 in G1BarrierSetC2::verify_gc_barriers(Compile*, BarrierSetC2::CompilePhase) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:1019 > #7 0x1461a325eec3 in G1BarrierSetC2::verify_gc_barriers(Compile*, BarrierSetC2::CompilePhase) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:963 > #8 0x1461a23160ed in Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*) /src/hotspot/share/opto/compile.cpp:875 > #9 0x1461a1845fd0 in C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) /src/hotspot/share/opto/c2compiler.cpp:142 > #10 0x1461a235ac39 in CompileBroker::invoke_compiler_on_method(CompileTask*) /src/hotspot/share/compiler/compileBroker.cpp:2305 > #11 0x1461a235ee4e in CompileBroker::compiler_thread_loop() /src/hotspot/share/compiler/compileBroker.cpp:1963 > #12 0x1461a4076f8d in JavaThread::thread_main_inner() /src/hotspot/share/runtime/javaThread.cpp:760 > #13 0x1461a409da23 in JavaThread::run() /src/hotspot/share/runtime/javaThread.cpp:745 > #14 0x1461a7b6d2bc in Thread::call_run() /src/hotspot/share/runtime/thread.cpp:221 > #15 0x1461a62a8105 in thread_native_entry /src/hotspot/os/linux/os_linux.cpp:846 > #16 0x1461c29801d9 in start_thread (/lib64/libpthread.so.0+0x81d9) > #17 0x1461c18cae72 in __clone (/lib64/libc.so.6+0x39e72) Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19125#issuecomment-2099745751 From aboldtch at openjdk.org Wed May 8 05:05:58 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 8 May 2024 05:05:58 GMT Subject: Integrated: 8331863: DUIterator_Fast used before it is constructed In-Reply-To: <5FEXRCspKpxNj3FJW1_2fqvdzuC40gTT8-SAG_pEflU=.69f4b39b-b7ed-4e45-87ac-2245ec75c789@github.com> References: <5FEXRCspKpxNj3FJW1_2fqvdzuC40gTT8-SAG_pEflU=.69f4b39b-b7ed-4e45-87ac-2245ec75c789@github.com> Message-ID: On Tue, 7 May 2024 16:13:38 GMT, Axel Boldt-Christmas wrote: > `SimpleDUIterator` constructs two `DUIterator_Fast` but passes a reference to the second when constructing the first. In debug values are read from this not yet constructed object. > > Found when building a debug build with UBSAN > > /src/hotspot/share/opto/node.cpp:124:8: runtime error: load of value 200, which is not a valid value for type 'bool' > #0 0x14619f4e6476 in DUIterator_Common::reset(DUIterator_Common const&) /src/hotspot/share/opto/node.cpp:124 > #1 0x1461a32556a5 in DUIterator_Fast::operator=(DUIterator_Fast const&) /src/hotspot/share/opto/node.hpp:1486 > #2 0x1461a32556a5 in Node::fast_outs(DUIterator_Fast&) const /src/hotspot/share/opto/node.hpp:1491 > #3 0x1461a32556a5 in SimpleDUIterator::SimpleDUIterator(Node*) /src/hotspot/share/opto/node.hpp:1575 > #4 0x1461a32556a5 in G1BarrierSetC2::has_cas_in_use_chain(Node*) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:855 > #5 0x1461a3256cf1 in G1BarrierSetC2::verify_pre_load(Node*, Unique_Node_List&) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:881 > #6 0x1461a325eec3 in G1BarrierSetC2::verify_gc_barriers(Compile*, BarrierSetC2::CompilePhase) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:1019 > #7 0x1461a325eec3 in G1BarrierSetC2::verify_gc_barriers(Compile*, BarrierSetC2::CompilePhase) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:963 > #8 0x1461a23160ed in Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*) /src/hotspot/share/opto/compile.cpp:875 > #9 0x1461a1845fd0 in C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) /src/hotspot/share/opto/c2compiler.cpp:142 > #10 0x1461a235ac39 in CompileBroker::invoke_compiler_on_method(CompileTask*) /src/hotspot/share/compiler/compileBroker.cpp:2305 > #11 0x1461a235ee4e in CompileBroker::compiler_thread_loop() /src/hotspot/share/compiler/compileBroker.cpp:1963 > #12 0x1461a4076f8d in JavaThread::thread_main_inner() /src/hotspot/share/runtime/javaThread.cpp:760 > #13 0x1461a409da23 in JavaThread::run() /src/hotspot/share/runtime/javaThread.cpp:745 > #14 0x1461a7b6d2bc in Thread::call_run() /src/hotspot/share/runtime/thread.cpp:221 > #15 0x1461a62a8105 in thread_native_entry /src/hotspot/os/linux/os_linux.cpp:846 > #16 0x1461c29801d9 in start_thread (/lib64/libpthread.so.0+0x81d9) > #17 0x1461c18cae72 in __clone (/lib64/libc.so.6+0x39e72) This pull request has now been integrated. Changeset: 466a21d8 Author: Axel Boldt-Christmas URL: https://git.openjdk.org/jdk/commit/466a21d8646c05d91f29d607c6347afd34c75629 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod 8331863: DUIterator_Fast used before it is constructed Reviewed-by: kvn, shade ------------- PR: https://git.openjdk.org/jdk/pull/19125 From chagedorn at openjdk.org Wed May 8 07:12:52 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 8 May 2024 07:12:52 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop In-Reply-To: <2lreoMy7UKtgM_m8RCU68rp3FFkoU8zj3ckuTKzXqf0=.dc02a0d4-2671-4c70-a470-a64f28e38f2d@github.com> References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> <2lreoMy7UKtgM_m8RCU68rp3FFkoU8zj3ckuTKzXqf0=.dc02a0d4-2671-4c70-a470-a64f28e38f2d@github.com> Message-ID: On Tue, 7 May 2024 16:47:47 GMT, Emanuel Peter wrote: >> In the test case: >> >> >> long i; >> for (; i > 0; i--) { >> res += 42 / ((int) i); >> >> >> The long counted loop phi has type `[1..100]`. As a consequence, the >> `ConvL2I` also has type `[1..100]`. The `DivI` node that follows can't >> fault: it is not guarded by a zero check and has no control set. >> >> The `ConvL2I` is split through phi and so is the `DiVI` node: >> `PhaseIdealLoop::cannot_split_division()` returns true because the >> value coming from the backedge into the `DivI` (when it is about to be >> split thru phi) is the result of the `ConvL2I` which has type >> `[1..100`] so is not zero as far as the compiler can tell. >> >> On the last iteration of the loop, i is 1. Because the DivI was split >> thru Phi, it computes the value for the following iteration, so for i >> = 0. This causes a crash when the compiled code runs. >> >> The same problem can't happen with an int counted loop because logic >> in `PhaseIdealLoop::split_thru_phi()` prevents a `ConvI2L` from being >> split thru phi. I propose to fix this the same way: in the test case, >> it's not true that once the `ConvL2I` is split thru phi it keeps type >> `[1..100]`. The fix is fairly conservative because it's base on the >> existing logic for `ConvI2L`: we would want to not split a `ConvL2I` >> only a counted loopd but. I suppose the same is true for the `ConvI2L` >> and I thought it would be best to revisit both together. > > test/hotspot/jtreg/compiler/splitif/TestLongCountedLoopConvL2I.java line 31: > >> 29: * -XX:+StressGCM -XX:StressSeed=92643864 TestLongCountedLoopConvL2I >> 30: * @run main/othervm -XX:-BackgroundCompilation -XX:-TieredCompilation -XX:-UseOnStackReplacement >> 31: * -XX:+StressGCM TestLongCountedLoopConvL2I > > Would it make sense to have a run that allows OSR? You should also add `-XX:+UnlockDiagnosticVMOptions` for the stress flag. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19086#discussion_r1593501246 From mli at openjdk.org Wed May 8 08:46:01 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 8 May 2024 08:46:01 GMT Subject: RFR: 8331908: Simplify log code in vectorintrinsics.cpp Message-ID: Hi, Can you help to review this simple patch? Curretly, log code in vectorintrinsics.cpp is a bit redundant, could be simplified a bit. Thanks. ## Test sanity test, jdk/incubator/vector ------------- Commit messages: - Initial commit Changes: https://git.openjdk.org/jdk/pull/19135/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19135&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331908 Stats: 497 lines in 1 file changed: 14 ins; 322 del; 161 mod Patch: https://git.openjdk.org/jdk/pull/19135.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19135/head:pull/19135 PR: https://git.openjdk.org/jdk/pull/19135 From galder at openjdk.org Wed May 8 09:23:57 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 8 May 2024 09:23:57 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v13] In-Reply-To: References: <9Eoh8hOSSVvAtf9iVQ6hflQyceUtt4dpZdqm61zg5XI=.358a4d79-70d9-4b54-85d5-37c6817f0fae@github.com> <_x-OSownzQQZ8fmlsbvQ42MLf9BGZskECTNncOE0s4E=.8381a076-0cc4-4339-924f-fa22ca780573@github.com> Message-ID: On Wed, 8 May 2024 04:31:05 GMT, Galder Zamarre?o wrote: >> The assert doesn't hold, e.g. >> >> >> === Output from failing command(s) repeated here === >> * For target buildtools_create_symbols_javac__the.COMPILE_CREATE_SYMBOLS_batch: >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (/home/runner/work/jdk/jdk/src/hotspot/share/c1/c1_GraphBuilder.cpp:2031), pid=75212, tid=75244 >> # Error: assert(!!type->as_instance_klass()->is_interface()) failed >> # >> # JRE version: OpenJDK Runtime Environment (23.0) (fastdebug build 23-internal-galderz-306db7459b1316251e36d0eccc3035d11db44889) >> # Java VM: OpenJDK 64-Bit Server VM (fastdebug 23-internal-galderz-306db7459b1316251e36d0eccc3035d11db44889, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) >> # Problematic frame: >> # V [libjvm.so+0x777cd0] GraphBuilder::invoke(Bytecodes::Code)+0x1200 >> >> >> Thoughts @rwestrel @dean-long? > > Hmmm, the double `!!`... let me fix that and see. Hmmm, something else is failing now. That's odd, maybe master has updated and is causing this PR to fail now? # Internal Error (/Users/runner/work/jdk/jdk/src/hotspot/share/ci/ciMetadata.hpp:88), pid=79328, tid=27395 # assert(is_instance_klass()) failed: bad cast I will look into it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1593714548 From galder at openjdk.org Wed May 8 09:23:57 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 8 May 2024 09:23:57 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v13] In-Reply-To: References: <9Eoh8hOSSVvAtf9iVQ6hflQyceUtt4dpZdqm61zg5XI=.358a4d79-70d9-4b54-85d5-37c6817f0fae@github.com> <_x-OSownzQQZ8fmlsbvQ42MLf9BGZskECTNncOE0s4E=.8381a076-0cc4-4339-924f-fa22ca780573@github.com> Message-ID: On Wed, 8 May 2024 09:18:59 GMT, Galder Zamarre?o wrote: >> Hmmm, the double `!!`... let me fix that and see. > > Hmmm, something else is failing now. That's odd, maybe master has updated and is causing this PR to fail now? > > > # Internal Error (/Users/runner/work/jdk/jdk/src/hotspot/share/ci/ciMetadata.hpp:88), pid=79328, tid=27395 > # assert(is_instance_klass()) failed: bad cast > > > I will look into it. Ah no, that assert comes from `type->as_instance_klass()` call: ciInstanceKlass* as_instance_klass() { assert(is_instance_klass(), "bad cast"); return (ciInstanceKlass*)this; } @rwestrel @dean-long what shall we do here? Do we remove the assert altogether? Does the code need to change for the assert to pass? Any other ideas? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1593717682 From dlong at openjdk.org Wed May 8 10:06:58 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 8 May 2024 10:06:58 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v15] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 04:34:18 GMT, Galder Zamarre?o wrote: >> Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. >> >> The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: >> >> >> $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op >> ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op >> ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op >> ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op >> ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op >> ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op >> ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op >> ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op >> ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op >> ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op >> ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op >> ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op >> >> >> It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. >> >> I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. >> >> >> $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> ... >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 >> >> >> One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? >> >>... > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Fix assert to only have a single ! src/hotspot/share/c1/c1_GraphBuilder.cpp line 2031: > 2029: ciType* type = receiver->exact_type(); > 2030: if (type != nullptr && type->is_loaded()) { > 2031: assert(!type->as_instance_klass()->is_interface(), ""); Suggestion: assert(!type->is_instance_klass() || !type->as_instance_klass()->is_interface(), ""); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1593776586 From stuefe at openjdk.org Wed May 8 10:40:59 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 8 May 2024 10:40:59 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v10] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 18:29:20 GMT, Vladimir Kozlov wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> fix compiler.c2.TestFindNode again > > `-XX:CompileCommand=memstat,compiler.c2.TestFindNode::*,print` - leftover from debugging? Many thanks, @vnkozlov ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2100279602 From stuefe at openjdk.org Wed May 8 10:41:00 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 8 May 2024 10:41:00 GMT Subject: Integrated: 8331185: Enable compiler memory limits in debug builds In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 10:04:28 GMT, Thomas Stuefe wrote: > See [1] for previous discussions. > > We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. > > The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. > > Examples: > > This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` > > This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` > > > --- > > The patch: > > 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. > 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. > 3) Adapted and extended tests > > I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. > > > Tested: > > - manually on Mac m1 (debug and release) > - GHAs are running > - but Oracle will do more testing before this goes in > > [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html This pull request has now been integrated. Changeset: ad78b7fa Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/ad78b7fa67ba30cab2e8f496e4c765be15deeca6 Stats: 166 lines in 7 files changed: 115 ins; 12 del; 39 mod 8331185: Enable compiler memory limits in debug builds Reviewed-by: asmehra, kvn ------------- PR: https://git.openjdk.org/jdk/pull/18969 From aph at openjdk.org Wed May 8 11:25:03 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 8 May 2024 11:25:03 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v8] In-Reply-To: References: Message-ID: <_7zY3gHXEP48uBLgwzxz8wYqv_97zMuIgqcxKBTGDCg=.5e185cd6-22c4-4922-a00c-afeb35799e6b@github.com> On Fri, 26 Apr 2024 12:52:15 GMT, Bhavana Kilambi wrote: >> Floating-point addition is non-associative, that is adding floating-point elements in arbitrary order may get different value. Specially, Vector API does not define the order of reduction intentionally, which allows platforms to generate more efficient codes [1]. So that needs a node to represent non strictly-ordered add-reduction for floating-point type in C2. >> >> To avoid introducing new nodes, this patch adds a bool field in `AddReductionVF/D` to distinguish whether they require strict order. It also removes `UnorderedReductionNode` and adds a virtual function `bool requires_strict_order()` in `ReductionNode`. Besides `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` have a fixed value. >> >> With this patch, Vector API would always generate non strictly-ordered `AddReductionVF/D' on SVE machines with vector length <= 16B as it is more beneficial to generate non-strictly ordered instructions on such machines compared to strictly ordered ones. >> >> [AArch64] >> On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. Auto-vectorization has already banned these nodes in JDK-8275275 [2]. >> >> This patch adds matching rules for non strictly-ordered `AddReductionVF/D`. >> >> No effects on other platforms. >> >> [Performance] >> FloatMaxVector.ADDLanes [3] measures the performance of add reduction for floating-point type. With this patch, it improves ~3x on my SVE machine (128-bit). >> >> ADDLanes >> >> Benchmark Before After Unit >> FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms >> >> >> Final code is as below: >> >> Before: >> ` fadda z17.s, p7/m, z17.s, z16.s >> ` >> After: >> >> faddp v17.4s, v21.4s, v21.4s >> faddp s18, v17.2s >> fadd s18, s18, s19 >> >> >> >> >> [Test] >> Full jtreg passed on AArch64 and x86. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2529 >> [2] https://bugs.openjdk.org/browse/JDK-8275275 >> [3] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L316 > > Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge master > - Adjust format for the backend rules changed in previous commit > - Address some more review comments > - Revert to previous indentation > - Add comments, revert to requires_strict_order and other minor changes > - Naming changes: replace strict/non-strict with more technical terms > - Addressed review comments for changes in backend rules and code style > - 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction > > Floating-point addition is non-associative, that is adding > floating-point elements in arbitrary order may get different value. > Specially, Vector API does not define the order of reduction > intentionally, which allows platforms to generate more efficient codes > [1]. So that needs a node to represent non strictly-ordered > add-reduction for floating-point type in C2. > > To avoid introducing new nodes, this patch adds a bool field in > `AddReductionVF/D` to distinguish whether they require strict order. It > also removes `UnorderedReductionNode` and adds a virtual function > `bool requires_strict_order()` in `ReductionNode`. Besides > `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` > have a fixed value. > > With this patch, Vector API would always generate non strictly-ordered > `AddReductionVF/D' on SVE machines with vector length <= 16B as it is > more beneficial to generate non-strictly ordered instructions on such > machines compared to strictly ordered ones. > > [AArch64] > On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. > Auto-vectorization has already banned these nodes in JDK-8275275 [2]. > > This patch adds matching rules for non strictly-ordered > `AddReductionVF/D`. > > No effects on other platforms. > > [Performance] > FloatMaxVector.ADDLanes [3] measures the performance of add reduction > for floating-point type. With this patch, it improves ~3x on my SVE > machine (128-bit). > > ADDLanes > Benchmark Before After Unit > FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms > > Final code is as below: > > ``` > Before: > fadda z17.s, p7/m, z17.s, z16.s > > After: > faddp v17.4s, v21.4s,... src/hotspot/cpu/aarch64/aarch64_vector.ad line 140: > 138: // The implementations of Op_AddReductionVD/F in Neon are for the Vector API only. > 139: // They are not suitable for auto-vectorization because the implementations cannot > 140: // guarantee strict ordering. Suggestion: // These implementations of Op_AddReductionVD/F in Neon are for the Vector API only. // They are not suitable for auto-vectorization because the result would not conform to the // JLS, Section Evaluation Order. src/hotspot/cpu/aarch64/aarch64_vector.ad line 2865: > 2863: // Non-strictly ordered floating-point add reduction for vector length of 64-bit. As an > 2864: // example, this rule can be reached from the VectorAPI (which allows for non-strictly ordered > 2865: // add reduction). Suggestion: // Non-strictly ordered floating-point add reduction for a 64-bits-long vector. This rule // is intended for the VectorAPI (which allows for non-strictly ordered add reduction). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1593863910 PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1593866102 From aph at openjdk.org Wed May 8 11:25:04 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 8 May 2024 11:25:04 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v8] In-Reply-To: <_7zY3gHXEP48uBLgwzxz8wYqv_97zMuIgqcxKBTGDCg=.5e185cd6-22c4-4922-a00c-afeb35799e6b@github.com> References: <_7zY3gHXEP48uBLgwzxz8wYqv_97zMuIgqcxKBTGDCg=.5e185cd6-22c4-4922-a00c-afeb35799e6b@github.com> Message-ID: On Wed, 8 May 2024 11:20:50 GMT, Andrew Haley wrote: >> Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - Merge master >> - Adjust format for the backend rules changed in previous commit >> - Address some more review comments >> - Revert to previous indentation >> - Add comments, revert to requires_strict_order and other minor changes >> - Naming changes: replace strict/non-strict with more technical terms >> - Addressed review comments for changes in backend rules and code style >> - 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction >> >> Floating-point addition is non-associative, that is adding >> floating-point elements in arbitrary order may get different value. >> Specially, Vector API does not define the order of reduction >> intentionally, which allows platforms to generate more efficient codes >> [1]. So that needs a node to represent non strictly-ordered >> add-reduction for floating-point type in C2. >> >> To avoid introducing new nodes, this patch adds a bool field in >> `AddReductionVF/D` to distinguish whether they require strict order. It >> also removes `UnorderedReductionNode` and adds a virtual function >> `bool requires_strict_order()` in `ReductionNode`. Besides >> `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` >> have a fixed value. >> >> With this patch, Vector API would always generate non strictly-ordered >> `AddReductionVF/D' on SVE machines with vector length <= 16B as it is >> more beneficial to generate non-strictly ordered instructions on such >> machines compared to strictly ordered ones. >> >> [AArch64] >> On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. >> Auto-vectorization has already banned these nodes in JDK-8275275 [2]. >> >> This patch adds matching rules for non strictly-ordered >> `AddReductionVF/D`. >> >> No effects on other platforms. >> >> [Performance] >> FloatMaxVector.ADDLanes [3] measures the performance of add reduction >> for floating-point type. With this patch, it improves ~3x on my SVE >> machine (128-bit). >> >> ADDLanes >> Benchmark Before After Unit >> FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms >> >> Final code is as below: >> >> ``` >> Before:... > > src/hotspot/cpu/aarch64/aarch64_vector.ad line 2865: > >> 2863: // Non-strictly ordered floating-point add reduction for vector length of 64-bit. As an >> 2864: // example, this rule can be reached from the VectorAPI (which allows for non-strictly ordered >> 2865: // add reduction). > > Suggestion: > > // Non-strictly ordered floating-point add reduction for a 64-bits-long vector. This rule > // is intended for the VectorAPI (which allows for non-strictly ordered add reduction). Please repeat this change everywhere. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1593867651 From aph at openjdk.org Wed May 8 11:28:00 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 8 May 2024 11:28:00 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v8] In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 12:52:15 GMT, Bhavana Kilambi wrote: >> Floating-point addition is non-associative, that is adding floating-point elements in arbitrary order may get different value. Specially, Vector API does not define the order of reduction intentionally, which allows platforms to generate more efficient codes [1]. So that needs a node to represent non strictly-ordered add-reduction for floating-point type in C2. >> >> To avoid introducing new nodes, this patch adds a bool field in `AddReductionVF/D` to distinguish whether they require strict order. It also removes `UnorderedReductionNode` and adds a virtual function `bool requires_strict_order()` in `ReductionNode`. Besides `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` have a fixed value. >> >> With this patch, Vector API would always generate non strictly-ordered `AddReductionVF/D' on SVE machines with vector length <= 16B as it is more beneficial to generate non-strictly ordered instructions on such machines compared to strictly ordered ones. >> >> [AArch64] >> On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. Auto-vectorization has already banned these nodes in JDK-8275275 [2]. >> >> This patch adds matching rules for non strictly-ordered `AddReductionVF/D`. >> >> No effects on other platforms. >> >> [Performance] >> FloatMaxVector.ADDLanes [3] measures the performance of add reduction for floating-point type. With this patch, it improves ~3x on my SVE machine (128-bit). >> >> ADDLanes >> >> Benchmark Before After Unit >> FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms >> >> >> Final code is as below: >> >> Before: >> ` fadda z17.s, p7/m, z17.s, z16.s >> ` >> After: >> >> faddp v17.4s, v21.4s, v21.4s >> faddp s18, v17.2s >> fadd s18, s18, s19 >> >> >> >> >> [Test] >> Full jtreg passed on AArch64 and x86. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2529 >> [2] https://bugs.openjdk.org/browse/JDK-8275275 >> [3] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L316 > > Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge master > - Adjust format for the backend rules changed in previous commit > - Address some more review comments > - Revert to previous indentation > - Add comments, revert to requires_strict_order and other minor changes > - Naming changes: replace strict/non-strict with more technical terms > - Addressed review comments for changes in backend rules and code style > - 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction > > Floating-point addition is non-associative, that is adding > floating-point elements in arbitrary order may get different value. > Specially, Vector API does not define the order of reduction > intentionally, which allows platforms to generate more efficient codes > [1]. So that needs a node to represent non strictly-ordered > add-reduction for floating-point type in C2. > > To avoid introducing new nodes, this patch adds a bool field in > `AddReductionVF/D` to distinguish whether they require strict order. It > also removes `UnorderedReductionNode` and adds a virtual function > `bool requires_strict_order()` in `ReductionNode`. Besides > `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` > have a fixed value. > > With this patch, Vector API would always generate non strictly-ordered > `AddReductionVF/D' on SVE machines with vector length <= 16B as it is > more beneficial to generate non-strictly ordered instructions on such > machines compared to strictly ordered ones. > > [AArch64] > On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. > Auto-vectorization has already banned these nodes in JDK-8275275 [2]. > > This patch adds matching rules for non strictly-ordered > `AddReductionVF/D`. > > No effects on other platforms. > > [Performance] > FloatMaxVector.ADDLanes [3] measures the performance of add reduction > for floating-point type. With this patch, it improves ~3x on my SVE > machine (128-bit). > > ADDLanes > Benchmark Before After Unit > FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms > > Final code is as below: > > ``` > Before: > fadda z17.s, p7/m, z17.s, z16.s > > After: > faddp v17.4s, v21.4s,... I have no further objections, but please wait for a C2 specialist to review this. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18034#pullrequestreview-2045384661 From rcastanedalo at openjdk.org Wed May 8 11:59:52 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 8 May 2024 11:59:52 GMT Subject: RFR: 8330584: IGV: XML does not save all node properties In-Reply-To: References: Message-ID: <_jl-HL9mMXMA4tjKEuA8qdsHOjkng2QQ541aeHjmmT8=.5c7348ae-cdde-4b12-94d1-cf2bb181d862@github.com> On Mon, 6 May 2024 12:06:20 GMT, Tobias Holenstein wrote: > When C2 sends graphs over the network to IGV, each graph is sent separately. The same applies if C2 saves graphs to XML: each graph is saved with all it's nodes as a separate `...` in the XML > > To save space, graphs that are saved from IGV only contains the incremental difference for each graph. This saves a lot of space (~5-10x). The logic happens in Printer.java -> `exportInputGraph(.., difference=true, ...)` Unfortunately, there is a bug in this logic: the properties of the nodes are not saved correctly. > > [graphs.zip](https://github.com/openjdk/jdk/files/15220940/graphs.zip) contains 4 graphs: > > `graph_c2.xml` (230KB) - a XML saved from C2 > `graph_igv_bug.xml` (73KB) - opened `graph_c2.xml` in IGV (without this fix) and save as `graph_igv_bug.xml`. > `graph_igv_fixed.xml` (123KB) - opened `graph_c2.xml` in IGV (with this fix) and save as `graph_igv_fixed.xml `. > > As you can see `graph_igv_fixed.xml` is twice as large as `graph_igv_bug.xml` because it contains the missing properties. But now the memory saving from the original `graph_c2.xml` is only ~2x. > Therefore a new format for saving is added: graphs can now be saved and opened from IGV as `.igv`. This uses a compressed (ZIP) format. > > `graph.igv` (10KB) is the same graph as `graph_c2.xml` (230KB). But it uses difference graph compression and ZIP compression and is in total 23x smaller in memory footprint. > > > > E.g. The root in the last graph of difference_true.xml has way less properties than in difference_false.xml. Good catch and nice feature! src/utils/IdealGraphVisualizer/Coordinator/src/main/java/com/sun/hotspot/igv/coordinator/OutlineTopComponent.java line 578: > 576: SwingUtilities.invokeLater(() -> { > 577: for (Node child : manager.getRootContext().getChildren().getNodes(true)) { > 578: // Nodes a lazily created. By expanding and collapsing they are all initialized Suggestion: // Nodes are lazily created. By expanding and collapsing they are all initialized ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19104#pullrequestreview-2045436896 PR Review Comment: https://git.openjdk.org/jdk/pull/19104#discussion_r1593900816 From tholenstein at openjdk.org Wed May 8 12:10:23 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 8 May 2024 12:10:23 GMT Subject: RFR: 8330584: IGV: XML does not save all node properties [v2] In-Reply-To: References: Message-ID: > When C2 sends graphs over the network to IGV, each graph is sent separately. The same applies if C2 saves graphs to XML: each graph is saved with all it's nodes as a separate `...` in the XML > > To save space, graphs that are saved from IGV only contains the incremental difference for each graph. This saves a lot of space (~5-10x). The logic happens in Printer.java -> `exportInputGraph(.., difference=true, ...)` Unfortunately, there is a bug in this logic: the properties of the nodes are not saved correctly. > > [graphs.zip](https://github.com/openjdk/jdk/files/15220940/graphs.zip) contains 4 graphs: > > `graph_c2.xml` (230KB) - a XML saved from C2 > `graph_igv_bug.xml` (73KB) - opened `graph_c2.xml` in IGV (without this fix) and save as `graph_igv_bug.xml`. > `graph_igv_fixed.xml` (123KB) - opened `graph_c2.xml` in IGV (with this fix) and save as `graph_igv_fixed.xml `. > > As you can see `graph_igv_fixed.xml` is twice as large as `graph_igv_bug.xml` because it contains the missing properties. But now the memory saving from the original `graph_c2.xml` is only ~2x. > Therefore a new format for saving is added: graphs can now be saved and opened from IGV as `.igv`. This uses a compressed (ZIP) format. > > `graph.igv` (10KB) is the same graph as `graph_c2.xml` (230KB). But it uses difference graph compression and ZIP compression and is in total 23x smaller in memory footprint. > > > > E.g. The root in the last graph of difference_true.xml has way less properties than in difference_false.xml. Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: Update src/utils/IdealGraphVisualizer/Coordinator/src/main/java/com/sun/hotspot/igv/coordinator/OutlineTopComponent.java Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19104/files - new: https://git.openjdk.org/jdk/pull/19104/files/eabd53cd..632b4baa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19104&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19104&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19104.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19104/head:pull/19104 PR: https://git.openjdk.org/jdk/pull/19104 From epeter at openjdk.org Wed May 8 12:11:02 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 May 2024 12:11:02 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v8] In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 12:52:15 GMT, Bhavana Kilambi wrote: >> Floating-point addition is non-associative, that is adding floating-point elements in arbitrary order may get different value. Specially, Vector API does not define the order of reduction intentionally, which allows platforms to generate more efficient codes [1]. So that needs a node to represent non strictly-ordered add-reduction for floating-point type in C2. >> >> To avoid introducing new nodes, this patch adds a bool field in `AddReductionVF/D` to distinguish whether they require strict order. It also removes `UnorderedReductionNode` and adds a virtual function `bool requires_strict_order()` in `ReductionNode`. Besides `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` have a fixed value. >> >> With this patch, Vector API would always generate non strictly-ordered `AddReductionVF/D' on SVE machines with vector length <= 16B as it is more beneficial to generate non-strictly ordered instructions on such machines compared to strictly ordered ones. >> >> [AArch64] >> On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. Auto-vectorization has already banned these nodes in JDK-8275275 [2]. >> >> This patch adds matching rules for non strictly-ordered `AddReductionVF/D`. >> >> No effects on other platforms. >> >> [Performance] >> FloatMaxVector.ADDLanes [3] measures the performance of add reduction for floating-point type. With this patch, it improves ~3x on my SVE machine (128-bit). >> >> ADDLanes >> >> Benchmark Before After Unit >> FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms >> >> >> Final code is as below: >> >> Before: >> ` fadda z17.s, p7/m, z17.s, z16.s >> ` >> After: >> >> faddp v17.4s, v21.4s, v21.4s >> faddp s18, v17.2s >> fadd s18, s18, s19 >> >> >> >> >> [Test] >> Full jtreg passed on AArch64 and x86. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2529 >> [2] https://bugs.openjdk.org/browse/JDK-8275275 >> [3] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L316 > > Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge master > - Adjust format for the backend rules changed in previous commit > - Address some more review comments > - Revert to previous indentation > - Add comments, revert to requires_strict_order and other minor changes > - Naming changes: replace strict/non-strict with more technical terms > - Addressed review comments for changes in backend rules and code style > - 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction > > Floating-point addition is non-associative, that is adding > floating-point elements in arbitrary order may get different value. > Specially, Vector API does not define the order of reduction > intentionally, which allows platforms to generate more efficient codes > [1]. So that needs a node to represent non strictly-ordered > add-reduction for floating-point type in C2. > > To avoid introducing new nodes, this patch adds a bool field in > `AddReductionVF/D` to distinguish whether they require strict order. It > also removes `UnorderedReductionNode` and adds a virtual function > `bool requires_strict_order()` in `ReductionNode`. Besides > `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` > have a fixed value. > > With this patch, Vector API would always generate non strictly-ordered > `AddReductionVF/D' on SVE machines with vector length <= 16B as it is > more beneficial to generate non-strictly ordered instructions on such > machines compared to strictly ordered ones. > > [AArch64] > On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. > Auto-vectorization has already banned these nodes in JDK-8275275 [2]. > > This patch adds matching rules for non strictly-ordered > `AddReductionVF/D`. > > No effects on other platforms. > > [Performance] > FloatMaxVector.ADDLanes [3] measures the performance of add reduction > for floating-point type. With this patch, it improves ~3x on my SVE > machine (128-bit). > > ADDLanes > Benchmark Before After Unit > FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms > > Final code is as below: > > ``` > Before: > fadda z17.s, p7/m, z17.s, z16.s > > After: > faddp v17.4s, v21.4s,... I'll look at it again, once my concerns are all addressed. @Bhavana-Kilambi feel free to ping me again for that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18034#issuecomment-2100431939 From dfenacci at openjdk.org Wed May 8 13:47:08 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 8 May 2024 13:47:08 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v9] In-Reply-To: References: Message-ID: > # Issue > When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. > > # Causes > On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. > The same is true for `StoreVector`s. > When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 > > where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor offsets interfere. > Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > but we don?t make sure that there are no masks or offsets. > A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and offsets here too but in this case we can include these cases if the masks and offsets of the vector stores are equivalent. > > # Solution > To avoid folding `Load`- and `StoreVector`s with masks and offsets we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns its own (to avoid ever being the same as a store node). In this way, the checks in `MemNode::can_see_stored_value` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 > > and `StoreNode::Identity` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > will fail if masks or offsets are used. > For 2 stores of the same value we instead check for mask and offset equality. > > Regression tests for all versions of `Load/StoreVectorGather/Masked` have been add... Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8325520: simplify check for offsets and masks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18347/files - new: https://git.openjdk.org/jdk/pull/18347/files/a2cb6a58..9b742109 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=07-08 Stats: 38 lines in 2 files changed: 6 ins; 21 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/18347.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18347/head:pull/18347 PR: https://git.openjdk.org/jdk/pull/18347 From aph at openjdk.org Wed May 8 14:26:04 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 8 May 2024 14:26:04 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v18] In-Reply-To: <-Q8XJ3BT26WE6vPUNR7-_Wi7iw7QKTi9O5HsvdeGh4M=.e35dc82b-326f-4207-a3f3-bacfb20032f4@github.com> References: <-Q8XJ3BT26WE6vPUNR7-_Wi7iw7QKTi9O5HsvdeGh4M=.e35dc82b-326f-4207-a3f3-bacfb20032f4@github.com> Message-ID: <54Lj3Z2JBIzBXLKm579qiAzQQXnNN3BrTPXBNXpCC7A=.2f3b353a-b97c-43f4-af95-de55c72e3fb7@github.com> On Tue, 7 May 2024 16:53:21 GMT, Vladimir Kozlov wrote: > I want to see performance numbers on x64 and aarch64 before starting looking on it. It would be nice to have data for all micros `test/micro/org/openjdk/bench/java/lang/ScopedValues*.java` > > Put results into JBS and post short summary here. > > You can compare by disable/enable new intrinsics. I'm on it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2100706963 From sgibbons at openjdk.org Wed May 8 14:30:54 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 8 May 2024 14:30:54 GMT Subject: RFR: 8331033: EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 [v3] In-Reply-To: References: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> Message-ID: On Tue, 7 May 2024 21:17:23 GMT, Scott Gibbons wrote: >> Added a strcmp for unsafe_setmemory in process_call_arguments() so the assert would not trigger. >> >> I believe this is the correct fix as I do not think the arguments for setMemory need special handling like arraycopy. >> >> I would like suggestions on how to generate a testcase to catch this type of error in mainline. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Review comments - change copyright, add @requires, change @run Awesome! Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19032#issuecomment-2100720273 From kvn at openjdk.org Wed May 8 14:30:54 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 8 May 2024 14:30:54 GMT Subject: RFR: 8331033: EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 [v3] In-Reply-To: References: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> Message-ID: On Tue, 7 May 2024 21:17:23 GMT, Scott Gibbons wrote: >> Added a strcmp for unsafe_setmemory in process_call_arguments() so the assert would not trigger. >> >> I believe this is the correct fix as I do not think the arguments for setMemory need special handling like arraycopy. >> >> I would like suggestions on how to generate a testcase to catch this type of error in mainline. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Review comments - change copyright, add @requires, change @run My testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19032#pullrequestreview-2045845632 From kvn at openjdk.org Wed May 8 14:35:52 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 8 May 2024 14:35:52 GMT Subject: RFR: 8331764: C2 SuperWord: refactor _align_to_ref/_mem_ref_for_main_loop_alignment In-Reply-To: References: Message-ID: On Tue, 7 May 2024 09:26:11 GMT, Emanuel Peter wrote: > This PR accomplishes these things: > - Rename `_align_to_ref` -> `_mem_ref_for_main_loop_alignment`. > - Move the `mem_ref` finding for alignment out of `SuperWord::find_adjacent_refs`. This is too early, and we don't even know if the relevant `mem_ref` is going to be vectorized. It makes more sense to pick a `mem_ref` directly in `SuperWord::adjust_pre_loop_limit_to_align_main_loop_vectors`, where we already know what packs are going to be vectorized. > - For the alignment width (aw), we can use the `vector_width` of the pack to which the `mem_ref` belongs, rather than the potentially much larger `vector_width_in_bytes`. I track this with `_aw_for_main_loop_alignment` now. > > I need this for https://github.com/openjdk/jdk/pull/18822, and decided to split it out into an independent change. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19115#pullrequestreview-2045861945 From jbhateja at openjdk.org Wed May 8 16:43:04 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 8 May 2024 16:43:04 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v15] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 16:58:09 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > parameter and local renames, update comment src/hotspot/cpu/x86/assembler_x86.cpp line 1971: > 1969: void Assembler::crc32(Register crc, Register v, int8_t sizeInBytes) { > 1970: assert(VM_Version::supports_sse4_2(), ""); > 1971: if (needs_rex2(crc, v)) { This being a map2 instruction should check for needs eevex, rex2 nomenclature looks misleading here. src/hotspot/cpu/x86/assembler_x86.cpp line 11902: > 11900: vex_x = (src_enc >= 16) && !src_is_gpr; > 11901: attributes->set_is_evex_instruction(); > 11902: evex_prefix(vex_r, vex_b, vex_x, evex_r, evex_b, evex_v, false /*eevex_x*/, nds_enc, pre, opc); Hi @steveatgh , UseAVX is set to level 3 only when target support AVX512F feature, entire encoding support for EVEX encoding is guarded by UseAVX > 2. Legacy map 2 and 3 instruction using EGPR register mandates Extended EVEX encoding and user may explicitly set UseAVX to level 2. What are your thoughts on extending the guarding check with UseAPX ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1593759162 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1593785935 From kvn at openjdk.org Wed May 8 16:48:02 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 8 May 2024 16:48:02 GMT Subject: RFR: 8331862: Remove split relocation info implementation [v2] In-Reply-To: References: Message-ID: > [Split relocation info](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/relocInfo.hpp#L225) was used only for SPARC. Non of current OpenJDK platforms use it. > > Tested tier1-3,stress,xcomp. Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: clean up comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19126/files - new: https://git.openjdk.org/jdk/pull/19126/files/a9fc1df8..64c9e66b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19126&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19126&range=00-01 Stats: 13 lines in 1 file changed: 0 ins; 6 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/19126.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19126/head:pull/19126 PR: https://git.openjdk.org/jdk/pull/19126 From dlong at openjdk.org Wed May 8 18:57:54 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 8 May 2024 18:57:54 GMT Subject: RFR: 8331862: Remove split relocation info implementation [v2] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 16:48:02 GMT, Vladimir Kozlov wrote: >> [Split relocation info](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/relocInfo.hpp#L225) was used only for SPARC. Non of current OpenJDK platforms use it. >> >> Tested tier1-3,stress,xcomp. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > clean up comments src/hotspot/cpu/s390/assembler_s390.cpp line 2: > 1: /* > 2: * Copyright (c) 2016, 2021, Oracle and/or its affiliates. All rights reserved. No changes to this file? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19126#discussion_r1594499001 From dlong at openjdk.org Wed May 8 19:14:56 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 8 May 2024 19:14:56 GMT Subject: RFR: 8331862: Remove split relocation info implementation [v2] In-Reply-To: References: Message-ID: <8JPE8NAiPPoaDoHnGHp-tiaaHSa9K7XIXLFkZDXFlEw=.99bbd30e-79e3-4ed7-baf2-4b8460f09415@github.com> On Wed, 8 May 2024 16:48:02 GMT, Vladimir Kozlov wrote: >> [Split relocation info](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/relocInfo.hpp#L225) was used only for SPARC. Non of current OpenJDK platforms use it. >> >> Tested tier1-3,stress,xcomp. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > clean up comments src/hotspot/share/code/relocInfo.hpp line 133: > 131: // Data: [] an oop stored in 4 bytes of instruction > 132: // [n] n is the index of an oop in the CodeBlob's oop pool > 133: // [Nn] index may be 32 bits if necessary Lines 132 and 133 could be combined into something like: // [[N]n] index of an oop in the CodeBlob's oop pool which seems consistent with other descriptions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19126#discussion_r1594515706 From dlong at openjdk.org Wed May 8 19:19:52 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 8 May 2024 19:19:52 GMT Subject: RFR: 8331862: Remove split relocation info implementation [v2] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 16:48:02 GMT, Vladimir Kozlov wrote: >> [Split relocation info](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/relocInfo.hpp#L225) was used only for SPARC. Non of current OpenJDK platforms use it. >> >> Tested tier1-3,stress,xcomp. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > clean up comments Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19126#pullrequestreview-2046464449 From kvn at openjdk.org Wed May 8 19:34:10 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 8 May 2024 19:34:10 GMT Subject: RFR: 8331862: Remove split relocation info implementation [v3] In-Reply-To: References: Message-ID: > [Split relocation info](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/relocInfo.hpp#L225) was used only for SPARC. Non of current OpenJDK platforms use it. > > Tested tier1-3,stress,xcomp. Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: address comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19126/files - new: https://git.openjdk.org/jdk/pull/19126/files/64c9e66b..0e3ac42b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19126&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19126&range=01-02 Stats: 3 lines in 2 files changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19126.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19126/head:pull/19126 PR: https://git.openjdk.org/jdk/pull/19126 From kvn at openjdk.org Wed May 8 19:34:10 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 8 May 2024 19:34:10 GMT Subject: RFR: 8331862: Remove split relocation info implementation [v2] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 16:48:02 GMT, Vladimir Kozlov wrote: >> [Split relocation info](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/relocInfo.hpp#L225) was used only for SPARC. Non of current OpenJDK platforms use it. >> >> Tested tier1-3,stress,xcomp. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > clean up comments Thank you, @dean-long, for review. I addressed all your comments. ------------- PR Review: https://git.openjdk.org/jdk/pull/19126#pullrequestreview-2046480668 From kvn at openjdk.org Wed May 8 19:34:10 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 8 May 2024 19:34:10 GMT Subject: RFR: 8331862: Remove split relocation info implementation [v3] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 18:55:02 GMT, Dean Long wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> address comments > > src/hotspot/cpu/s390/assembler_s390.cpp line 2: > >> 1: /* >> 2: * Copyright (c) 2016, 2024, Oracle and/or its affiliates. All rights reserved. > > No changes to this file? Accidental change. Reverted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19126#discussion_r1594530168 From kvn at openjdk.org Wed May 8 19:34:10 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 8 May 2024 19:34:10 GMT Subject: RFR: 8331862: Remove split relocation info implementation [v2] In-Reply-To: <8JPE8NAiPPoaDoHnGHp-tiaaHSa9K7XIXLFkZDXFlEw=.99bbd30e-79e3-4ed7-baf2-4b8460f09415@github.com> References: <8JPE8NAiPPoaDoHnGHp-tiaaHSa9K7XIXLFkZDXFlEw=.99bbd30e-79e3-4ed7-baf2-4b8460f09415@github.com> Message-ID: On Wed, 8 May 2024 19:12:16 GMT, Dean Long wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> clean up comments > > src/hotspot/share/code/relocInfo.hpp line 133: > >> 131: // Data: [] an oop stored in 4 bytes of instruction >> 132: // [n] n is the index of an oop in the CodeBlob's oop pool >> 133: // [Nn] index may be 32 bits if necessary > > Lines 132 and 133 could be combined into something like: > > // [[N]n] index of an oop in the CodeBlob's oop pool > > which seems consistent with other descriptions. Okay. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19126#discussion_r1594530833 From kvn at openjdk.org Wed May 8 19:55:54 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 8 May 2024 19:55:54 GMT Subject: RFR: 8331862: Remove split relocation info implementation [v3] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 19:34:10 GMT, Vladimir Kozlov wrote: >> [Split relocation info](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/relocInfo.hpp#L225) was used only for SPARC. Non of current OpenJDK platforms use it. >> >> Tested tier1-3,stress,xcomp. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > address comments @TheRealMDoerr, @RealFYang, @offamitkumar, @bulasevich I need your help with testing this on your platforms, at least tier1. GHA does some cross compilation but not testing. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19126#issuecomment-2101318735 From bkilambi at openjdk.org Wed May 8 20:26:04 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 8 May 2024 20:26:04 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v8] In-Reply-To: References: Message-ID: <8-_t7nWbR9gZ2_QkfFNuf5M0Q4PMkKJKgwS3ZbHcCxI=.32dc4f11-dec5-468d-afc8-3b4dae285dcb@github.com> On Tue, 7 May 2024 13:20:37 GMT, Emanuel Peter wrote: > I just realized that there is no regression test. And I think it would be nice to have one. > > Also, we should add some sort of message to the `dump` if the `ReductionNode` has the `requires_strict_order` on or off. I think that could be done in `dump_spec`. > > You could do it similar to: > > ``` > #ifndef PRODUCT > void VectorMaskCmpNode::dump_spec(outputStream *st) const { > st->print(" %d #", _predicate); _type->dump_on(st); > } > #endif // PRODUCT > ``` > > This would actually allow you to create a IR test! > > You would check that the AddReductionVNode is annotated correctly. You need some VectorAPI tests, and some SuperWord auto-vectorization tests. > > How does that sound? That would ensure that nobody can easily destroy your RFE, at least not in the IR. Hi @eme64 , thanks for the suggestion. I can add the `dump_spec` as suggested (which would print if the `_requires_strict_order` flag is enabled/disabled) but I am not sure if I fully understand what's expected in the JTREG tests. Should I be verifying the `-XX:+PrintIdeal` output to make sure the correct message is being printed for the `ReductionV*` nodes? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18034#issuecomment-2101362464 From duke at openjdk.org Wed May 8 20:30:00 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Wed, 8 May 2024 20:30:00 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v15] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 10:12:29 GMT, Jatin Bhateja wrote: >> Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: >> >> parameter and local renames, update comment > > src/hotspot/cpu/x86/assembler_x86.cpp line 11902: > >> 11900: vex_x = (src_enc >= 16) && !src_is_gpr; >> 11901: attributes->set_is_evex_instruction(); >> 11902: evex_prefix(vex_r, vex_b, vex_x, evex_r, evex_b, evex_v, false /*eevex_x*/, nds_enc, pre, opc); > > Hi @steveatgh , UseAVX is set to level 3 only when target support AVX512F feature, entire encoding support for EVEX encoding is guarded by UseAVX > 2. Legacy map 2 and 3 instruction using EGPR register mandates Extended EVEX encoding and user may explicitly set UseAVX to level 2. > What are your thoughts on extending the guarding check with UseAPX ? Thanks @jatin-bhateja . Do you mean a check such as: `if ((UseAVX > 2 || UseAPX) && !attributes->is_legacy_mode())` ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1594587834 From duke at openjdk.org Wed May 8 23:40:20 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Wed, 8 May 2024 23:40:20 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v16] In-Reply-To: References: Message-ID: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: add ::needs_eevex for use with promoted map2 instructions (e.g. crc32) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/2a63a159..52628798 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=14-15 Stats: 8 lines in 2 files changed: 6 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From duke at openjdk.org Wed May 8 23:40:21 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Wed, 8 May 2024 23:40:21 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v15] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 09:52:04 GMT, Jatin Bhateja wrote: >> Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: >> >> parameter and local renames, update comment > > src/hotspot/cpu/x86/assembler_x86.cpp line 1971: > >> 1969: void Assembler::crc32(Register crc, Register v, int8_t sizeInBytes) { >> 1970: assert(VM_Version::supports_sse4_2(), ""); >> 1971: if (needs_rex2(crc, v)) { > > This being a map2 instruction should check for needs eevex, rex2 nomenclature looks misleading here. Thanks for the comment. Although crc32 is the only promoted map2 instruction (currently) implemented in the assembler, additional map2 instructions may be added later. I added ::needs_eevex and used as you suggest in the crc32 instr. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1594808629 From cslucas at openjdk.org Wed May 8 23:49:19 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 8 May 2024 23:49:19 GMT Subject: RFR: JDK-8330795 : C2: assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type with -XX:-UseCompressedClassPointers Message-ID: The `assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type` failure was caused by the fact that we didn't have a "zero value" for the type T_METADATA. The RAM patch uses that data when it creates a Phi node merging Klass loads and UseCompressedClassPointers is disabled. Tested with JTREG tier1-4 on Linux x86_64 & ARM64. ------------- Commit messages: - Add null and zero types. Changes: https://git.openjdk.org/jdk/pull/19148/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19148&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330795 Stats: 65 lines in 2 files changed: 65 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19148.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19148/head:pull/19148 PR: https://git.openjdk.org/jdk/pull/19148 From cslucas at openjdk.org Wed May 8 23:50:22 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 8 May 2024 23:50:22 GMT Subject: RFR: JDK-8330565 : C2: Multiple crashes with CTW after JDK-8316991 Message-ID: The `# assert(false) failed: Bad graph detected in build_loop_late` failure was caused because a string concatenation optimization using [this method](https://github.com/openjdk/jdk/blob/819f3d6fc70ff6fe54ac5f9033c17c3dd4326aa5/src/hotspot/share/opto/graphKit.cpp#L4115) adds AddP and LoadN nodes to IR graph as NotNull _and_ because RAM was not "nullyfing" phis merging nullable pointers. I was only able to reproduce this problem using a classfile/jar compiled using an "old" version of JDK.. because newer version use InvokeDynamic to do string concatenation. Tested with JTREG tier1-4 on Linux x86_64 & ARM64. ------------- Commit messages: - Make phi merging pointer loads nullable & add test. Changes: https://git.openjdk.org/jdk/pull/19147/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19147&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330565 Stats: 83 lines in 2 files changed: 83 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19147.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19147/head:pull/19147 PR: https://git.openjdk.org/jdk/pull/19147 From kvn at openjdk.org Thu May 9 01:06:00 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 9 May 2024 01:06:00 GMT Subject: RFR: JDK-8330795 : C2: assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type with -XX:-UseCompressedClassPointers In-Reply-To: References: Message-ID: On Wed, 8 May 2024 23:44:26 GMT, Cesar Soares Lucas wrote: > The `assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type` failure was caused by the fact that we didn't have a "zero value" for the type T_METADATA. The RAM patch uses that data when it creates a Phi node merging Klass loads and UseCompressedClassPointers is disabled. > > Tested with JTREG tier1-4 on Linux x86_64 & ARM64. New test failed in GHA with 32-bit VM because: Unrecognized VM option 'UseCompressedClassPointers' You can add `-XX:+IgnoreUnrecognizedVMOptions` to run test on all platforms. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19148#issuecomment-2101739736 From kvn at openjdk.org Thu May 9 01:28:51 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 9 May 2024 01:28:51 GMT Subject: RFR: JDK-8330565 : C2: Multiple crashes with CTW after JDK-8316991 In-Reply-To: References: Message-ID: On Wed, 8 May 2024 23:44:23 GMT, Cesar Soares Lucas wrote: > The `# assert(false) failed: Bad graph detected in build_loop_late` failure was caused because a string concatenation optimization using [this method](https://github.com/openjdk/jdk/blob/819f3d6fc70ff6fe54ac5f9033c17c3dd4326aa5/src/hotspot/share/opto/graphKit.cpp#L4115) adds AddP and LoadN nodes to IR graph as NotNull _and_ because RAM was not "nullyfing" phis merging nullable pointers. I was only able to reproduce this problem using a classfile/jar compiled using an "old" version of JDK.. because newer version use InvokeDynamic to do string concatenation. > > Tested with JTREG tier1-4 on Linux x86_64 & ARM64. src/hotspot/share/opto/escape.cpp line 779: > 777: _igvn->set_type(data_phi, new_t); > 778: data_phi->raise_bottom_type(new_t); > 779: } Do you intentionally execute `_igvn->transform(` for `data_phi` before you set inputs and now type? Usually we do transform after we fully construct node. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19147#discussion_r1594859343 From kvn at openjdk.org Thu May 9 01:40:51 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 9 May 2024 01:40:51 GMT Subject: RFR: JDK-8330795 : C2: assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type with -XX:-UseCompressedClassPointers In-Reply-To: References: Message-ID: On Wed, 8 May 2024 23:44:26 GMT, Cesar Soares Lucas wrote: > The `assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type` failure was caused by the fact that we didn't have a "zero value" for the type T_METADATA. The RAM patch uses that data when it creates a Phi node merging Klass loads and UseCompressedClassPointers is disabled. > > Tested with JTREG tier1-4 on Linux x86_64 & ARM64. @JohnTortugo, thank you for adding new test. But it would be nice also add additional run with `-XX:+IgnoreUnrecognizedVMOptions -XX:-UseCompressedClassPointers` to failed test `test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java` Also why you require to run test only with compressed oops on?: * @requires vm.debug == true & vm.bits == 64 & vm.compiler2.enabled & vm.opt.final.UseCompressedOops & vm.opt.final.EliminateAllocations ------------- PR Comment: https://git.openjdk.org/jdk/pull/19148#issuecomment-2101773136 From kvn at openjdk.org Thu May 9 01:48:51 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 9 May 2024 01:48:51 GMT Subject: RFR: JDK-8330795 : C2: assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type with -XX:-UseCompressedClassPointers In-Reply-To: References: Message-ID: On Thu, 9 May 2024 01:38:44 GMT, Vladimir Kozlov wrote: > @JohnTortugo, thank you for adding new test. But it would be nice also add additional run with `-XX:+IgnoreUnrecognizedVMOptions -XX:-UseCompressedClassPointers` to failed test `test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java` Actually `-XX:+IgnoreUnrecognizedVMOptions` is not needed because you require `vm.bits == 64` in the test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19148#issuecomment-2101781070 From cslucas at openjdk.org Thu May 9 03:09:11 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 9 May 2024 03:09:11 GMT Subject: RFR: JDK-8330795 : C2: assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type with -XX:-UseCompressedClassPointers [v2] In-Reply-To: References: Message-ID: > The `assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type` failure was caused by the fact that we didn't have a "zero value" for the type T_METADATA. The RAM patch uses that data when it creates a Phi node merging Klass loads and UseCompressedClassPointers is disabled. > > Tested with JTREG tier1-4 on Linux x86_64 & ARM64. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Require vm.bits == 64 on new test. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19148/files - new: https://git.openjdk.org/jdk/pull/19148/files/ea64c880..91fc61de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19148&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19148&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19148.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19148/head:pull/19148 PR: https://git.openjdk.org/jdk/pull/19148 From cslucas at openjdk.org Thu May 9 03:11:53 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 9 May 2024 03:11:53 GMT Subject: RFR: JDK-8330795 : C2: assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type with -XX:-UseCompressedClassPointers In-Reply-To: References: Message-ID: On Thu, 9 May 2024 01:46:45 GMT, Vladimir Kozlov wrote: > @JohnTortugo, thank you for adding new test. But it would be nice also add additional run with -XX:+IgnoreUnrecognizedVMOptions -XX:-UseCompressedClassPointers to failed test test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java Thank you @vnkozlov , I'll work on that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19148#issuecomment-2101853175 From duke at openjdk.org Thu May 9 03:35:58 2024 From: duke at openjdk.org (duke) Date: Thu, 9 May 2024 03:35:58 GMT Subject: Withdrawn: 8325681: C2 inliner rejects to inline a deeper callee because the methoddata of caller is immature. In-Reply-To: References: Message-ID: <2-DiL7OaUdt4ncWDCNxGK2DJerNN4mqmJDPSEEvIFBQ=.0b5704fc-aa96-491c-80b1-734b01b3863a@github.com> On Thu, 22 Feb 2024 05:37:26 GMT, Xin Liu wrote: > This patch uses the methoddata of a method no matter it is mature or not to initialize `ciCallProfile`. Previously, C2 drops premature methoddata and leaves _count field of ciCallProfile -1. This leads C2 refuses to inline the callsite because its frequency is too low(-1 < MinInlineFrequencyRatio). > > In the given example, we observes that baz was not inlined because of 'low call site frequency'. This is wrong because its real frequency is 10% > MinInlineFrequencyRatio. > > > 60 13 b 4 UnderProfiledSubprocedure::foo (9 bytes) > @ 5 UnderProfiledSubprocedure::bar (6 bytes) inline (hot) > @ 1 UnderProfiledSubprocedure::baz (19 bytes) failed to inline: low call site frequency This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/17957 From duke at openjdk.org Thu May 9 03:56:59 2024 From: duke at openjdk.org (duke) Date: Thu, 9 May 2024 03:56:59 GMT Subject: Withdrawn: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 08:10:54 GMT, Quan Anh Mai wrote: > Hi, > > This patch introduces `JitCompiler::isConstantExpression` which can be used to statically determine whether an expression has been constant-folded by the Jit compiler, leading to more constant-folding opportunities. For example, it can be used in `MemorySessionImpl::checkValidStateRaw` to eliminate the lifetime check on global sessions without imposing additional branches on other non-global sessions. This is similar to `__builtin_constant_p` in GCC and clang. > > Please kindly give your opinion as well as your reviews, thanks very much. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/17527 From duke at openjdk.org Thu May 9 04:28:00 2024 From: duke at openjdk.org (duke) Date: Thu, 9 May 2024 04:28:00 GMT Subject: Withdrawn: 8315361: C2 SuperWord: refactor out loop analysis into shared auto-vectorization facility VLoopAnalyzer In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 17:24:22 GMT, Emanuel Peter wrote: > This is a refactoring of `SuperWord`. > > **Goals** > > 1. Clean up `SuperWord`: disentangle different components, make them more **modular**. > 2. Make the loop analysis parts a **shared facility**, not just for SuperWord but also the post-loop-vectorizer ([JDK-8308994](https://bugs.openjdk.org/browse/JDK-8308994)). > 3. It is also a necessary step on my bigger plans for improvement with the C2 Auto-Vectorizer ([see my blog post](https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html)). > 4. Improve tracing in the auto-vectorization by making it more systematic. > > **Summary** > > - I wrote a summary of how C2 auto-vectorization with SuperWord works (please read!): > https://github.com/openjdk/jdk/blob/95fd361e60fc66eb91edad321662e508b2d1bdde/src/hotspot/share/opto/superword.hpp#L32-L177 > - I moved many `Superword` components out to `VLoop` and to `VLoopAnalyzer`. The idea is that any vectorizer can use these facilities in the future. They are therefore made more modular, which should hopefully make future changes easier. These components are: > - Checking the pre-conditions for vectorization (e.g. no unwanted ctrl-flow). > - `VLoop::check_preconditions_helper` replaces code from old `SuperWord::transform_loop`. > - Running all submodules of `VLoopAnalyzer`: `VLoopAnalyzer::analyze_helper`. Replaces analysis part of `SuperWord::SLP_extract`. > - Finding and marking reductions -> `VLoopReductions` > - Detecting memory slices -> `VLoopMemorySlices` > - Analyzing the body -> `VLoopBody` (renamed `in_bb` -> `in_body`) > - Determining vector element types, and functions to determine the `vector_width` of a node -> `VLoopTypes` > - Constructing the dependence graph -> `VLoopDependenceGraph`. Replaces old `DepGraph` with all its components. > - New: CompileCommand option `TraceAutovectorization` > - Run with `-XX:CompileCommand=traceAutovectorization,*::*,help` to get a usage description. > - Replaced all printing with flags `TraceSuperWord` (and `Verbose`) and of `VectorizeDebug`. > - The advantage of a CompileCommand is that tracing can be applied selectively for only a limited set of java classes / methods. > - It uses tags, which are more readable than the `VectorizeDebug` bit-flags. These tags can be used for all parts of the vectorizer, but one can also target SuperWord specifically. > - I systematically added tracing at every point where vectorization (partially) fails (use tag `SW_REJECTIONS`). > - `TraceSuperWord` still works, and performs the sa... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/16620 From mli at openjdk.org Thu May 9 08:46:17 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 9 May 2024 08:46:17 GMT Subject: RFR: 8331577: RISC-V: C2 CountLeadingZerosV Message-ID: Hi, Can you help to review this patch adding CountLeadingZerosV and CountTrailingZerosV instrinsics? Thanks. ------------- Commit messages: - typo - Initial commit Changes: https://git.openjdk.org/jdk/pull/19153/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19153&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331577 Stats: 63 lines in 3 files changed: 62 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19153.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19153/head:pull/19153 PR: https://git.openjdk.org/jdk/pull/19153 From amitkumar at openjdk.org Thu May 9 09:12:54 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 9 May 2024 09:12:54 GMT Subject: RFR: 8331862: Remove split relocation info implementation [v3] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 19:34:10 GMT, Vladimir Kozlov wrote: >> [Split relocation info](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/relocInfo.hpp#L225) was used only for SPARC. Non of current OpenJDK platforms use it. >> >> Tested tier1-3,stress,xcomp. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > address comments Result Looks good on s390x. I ran `tier1` tests on `fastdebug-vm`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19126#issuecomment-2102269058 From mli at openjdk.org Thu May 9 09:48:09 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 9 May 2024 09:48:09 GMT Subject: RFR: 8331577: RISC-V: C2 CountLeadingZerosV [v2] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch adding CountLeadingZerosV and CountTrailingZerosV instrinsics? > Thanks. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: fix masked issue ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19153/files - new: https://git.openjdk.org/jdk/pull/19153/files/9c38914a..1d5d17fe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19153&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19153&range=00-01 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/19153.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19153/head:pull/19153 PR: https://git.openjdk.org/jdk/pull/19153 From mli at openjdk.org Thu May 9 10:28:54 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 9 May 2024 10:28:54 GMT Subject: RFR: 8331577: RISC-V: C2 CountLeadingZerosV [v2] In-Reply-To: References: Message-ID: <8PHpGLNxXgcc-oM9IAc9UnnJNCaF34NnEHFP2R2nSvs=.383399b1-e70e-460c-8efe-9d88ab6a34ba@github.com> On Thu, 9 May 2024 09:48:09 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch adding CountLeadingZerosV and CountTrailingZerosV instrinsics? >> Thanks. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > fix masked issue NOTE: the reason why let dst and src share one register (i.e. `(vReg dst_src, vRegMask_V0 v0)`) in masked version is that for inactive elements, we should keep the origin value, neither `mu` or `ma` will do it. BTW, I will also re-visit all existing masked version instructions to make sure it works as expected. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19153#issuecomment-2102392793 From mli at openjdk.org Thu May 9 11:14:14 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 9 May 2024 11:14:14 GMT Subject: RFR: 8331993: Add counting leading/trailing zero tests for Integer Message-ID: <7a7fXkgF6v-sSFHCk-GT0DbHr9t8AO7bGh1X1JaF-gg=.19a655eb-cb68-446c-8207-270a2ee87492@github.com> Hi, Can you help to review the patch adding some test? Currently, in hotspot/jtreg/compiler/vectorization/TestNumberOfContinuousZeros.java, there is only tests for Long, not for Integer. Thanks. ------------- Commit messages: - Initial commit Changes: https://git.openjdk.org/jdk/pull/19154/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19154&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331993 Stats: 59 lines in 2 files changed: 44 ins; 0 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/19154.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19154/head:pull/19154 PR: https://git.openjdk.org/jdk/pull/19154 From jbhateja at openjdk.org Thu May 9 11:23:56 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 9 May 2024 11:23:56 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v15] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 20:27:29 GMT, Steve Dohrmann wrote: > UseAPX Yes, attaching a test depicting incorrectness with UseAVX=2 for SHLX which is a legacy map 2 instruction promotable to extended EVEX with EGPR operands. [shift_left_APX.txt](https://github.com/openjdk/jdk/files/15261495/shift_left_APX.txt) It will not be appropriate to modify VM_Version::supports_evex for APX feature since its used for constraining dynamic register classes associated with vector operands. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1595320541 From fyang at openjdk.org Thu May 9 11:27:50 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 9 May 2024 11:27:50 GMT Subject: RFR: 8331577: RISC-V: C2 CountLeadingZerosV [v2] In-Reply-To: <8PHpGLNxXgcc-oM9IAc9UnnJNCaF34NnEHFP2R2nSvs=.383399b1-e70e-460c-8efe-9d88ab6a34ba@github.com> References: <8PHpGLNxXgcc-oM9IAc9UnnJNCaF34NnEHFP2R2nSvs=.383399b1-e70e-460c-8efe-9d88ab6a34ba@github.com> Message-ID: <47txZsG98U3vKdhefoQGDYz5g6IPFFWWzQFI9P6pA0A=.1a396733-46e7-4eb5-9c56-d6293196056f@github.com> On Thu, 9 May 2024 10:26:16 GMT, Hamlin Li wrote: > NOTE: the reason why let dst and src share one register (i.e. `(vReg dst_src, vRegMask_V0 v0)`) in masked version is that for inactive elements, we should keep the origin value, neither `mu` or `ma` will do it. Interesting. Is it specified anywhere? > BTW, I will also re-visit all existing masked version instructions to make sure it works as expected. tracked by https://bugs.openjdk.org/browse/JDK-8331992 I think this issue was considered before when we were adding support for vector api. What about the recently added ones like ReverseBytesV, PopCountVI/L? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19153#issuecomment-2102480131 From sgibbons at openjdk.org Thu May 9 12:01:03 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 9 May 2024 12:01:03 GMT Subject: Integrated: 8331033: EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 In-Reply-To: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> References: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> Message-ID: On Wed, 1 May 2024 14:01:38 GMT, Scott Gibbons wrote: > Added a strcmp for unsafe_setmemory in process_call_arguments() so the assert would not trigger. > > I believe this is the correct fix as I do not think the arguments for setMemory need special handling like arraycopy. > > I would like suggestions on how to generate a testcase to catch this type of error in mainline. This pull request has now been integrated. Changeset: 0a4eeeaa Author: Scott Gibbons Committer: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/0a4eeeaa3c63585244be959386dd94882398e87f Stats: 108 lines in 2 files changed: 107 ins; 0 del; 1 mod 8331033: EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 Co-authored-by: Jatin Bhateja Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/19032 From mli at openjdk.org Thu May 9 12:02:56 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 9 May 2024 12:02:56 GMT Subject: RFR: 8331577: RISC-V: C2 CountLeadingZerosV [v2] In-Reply-To: <47txZsG98U3vKdhefoQGDYz5g6IPFFWWzQFI9P6pA0A=.1a396733-46e7-4eb5-9c56-d6293196056f@github.com> References: <8PHpGLNxXgcc-oM9IAc9UnnJNCaF34NnEHFP2R2nSvs=.383399b1-e70e-460c-8efe-9d88ab6a34ba@github.com> <47txZsG98U3vKdhefoQGDYz5g6IPFFWWzQFI9P6pA0A=.1a396733-46e7-4eb5-9c56-d6293196056f@github.com> Message-ID: On Thu, 9 May 2024 11:24:47 GMT, Fei Yang wrote: > > NOTE: the reason why let dst and src share one register (i.e. `(vReg dst_src, vRegMask_V0 v0)`) in masked version is that for inactive elements, we should keep the origin value, neither `mu` or `ma` will do it. > > Interesting. Is it specified anywhere? For the Semantics of `mu` or `ma`, it's https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#343-vector-tail-agnostic-and-vector-mask-agnostic-vta-and-vma. Based on this, we can deduce that here is a hidden bug. > > > BTW, I will also re-visit all existing masked version instructions to make sure it works as expected. tracked by https://bugs.openjdk.org/browse/JDK-8331992 > > I think this issue was considered before when we were adding support for vector api. What about the recently added ones like ReverseBytesV, PopCountVI/L? Yeh, I'm testing with a fix including those 2 intrinsics. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19153#issuecomment-2102527828 From fyang at openjdk.org Thu May 9 12:19:53 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 9 May 2024 12:19:53 GMT Subject: RFR: 8331577: RISC-V: C2 CountLeadingZerosV [v2] In-Reply-To: References: <8PHpGLNxXgcc-oM9IAc9UnnJNCaF34NnEHFP2R2nSvs=.383399b1-e70e-460c-8efe-9d88ab6a34ba@github.com> <47txZsG98U3vKdhefoQGDYz5g6IPFFWWzQFI9P6pA0A=.1a396733-46e7-4eb5-9c56-d6293196056f@github.com> Message-ID: On Thu, 9 May 2024 12:00:22 GMT, Hamlin Li wrote: > > > BTW, I will also re-visit all existing masked version instructions to make sure it works as expected. tracked by https://bugs.openjdk.org/browse/JDK-8331992 Sorry for not being accurate. In fact, I mean requirement at the Java level. Why should we keep the origin value of inactive elements from the input vector? I didn't notice such a requirement. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19153#issuecomment-2102549043 From mli at openjdk.org Thu May 9 12:45:52 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 9 May 2024 12:45:52 GMT Subject: RFR: 8331577: RISC-V: C2 CountLeadingZerosV [v2] In-Reply-To: References: <8PHpGLNxXgcc-oM9IAc9UnnJNCaF34NnEHFP2R2nSvs=.383399b1-e70e-460c-8efe-9d88ab6a34ba@github.com> <47txZsG98U3vKdhefoQGDYz5g6IPFFWWzQFI9P6pA0A=.1a396733-46e7-4eb5-9c56-d6293196056f@github.com> Message-ID: On Thu, 9 May 2024 12:14:28 GMT, Fei Yang wrote: > > > > BTW, I will also re-visit all existing masked version instructions to make sure it works as expected. tracked by https://bugs.openjdk.org/browse/JDK-8331992 > > Sorry for not being accurate. In fact, I mean requirement at the Java level. Why should we keep the origin value of inactive elements from the input vector? I didn't notice such a requirement before. I'm not sure about other places, but in vector APi, if you do operations with a mask, then the untouched (inactive in riscv vector insts) elements should be unmodified, i.e. same as original values. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19153#issuecomment-2102590188 From fyang at openjdk.org Thu May 9 13:20:51 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 9 May 2024 13:20:51 GMT Subject: RFR: 8331577: RISC-V: C2 CountLeadingZerosV [v2] In-Reply-To: References: <8PHpGLNxXgcc-oM9IAc9UnnJNCaF34NnEHFP2R2nSvs=.383399b1-e70e-460c-8efe-9d88ab6a34ba@github.com> <47txZsG98U3vKdhefoQGDYz5g6IPFFWWzQFI9P6pA0A=.1a396733-46e7-4eb5-9c56-d6293196056f@github.com> Message-ID: On Thu, 9 May 2024 12:43:21 GMT, Hamlin Li wrote: > > Sorry for not being accurate. In fact, I mean requirement at the Java level. Why should we keep the origin value of inactive elements from the input vector? I didn't notice such a requirement before. > > I'm not sure about other places, but in vector APi, if you do operations with a mask, then the untouched (inactive in riscv vector insts) elements should be unmodified, i.e. same as original values. It will be helpful if you could point to the specific code or examples. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19153#issuecomment-2102645274 From imyers at openjdk.org Thu May 9 13:21:55 2024 From: imyers at openjdk.org (Ian Myers) Date: Thu, 9 May 2024 13:21:55 GMT Subject: RFR: 8324756: Test vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize is too slow due to dependency verification [v2] In-Reply-To: <-ig7Zj830qvQ91e_kbIRRfOn_8Pm23qxFOxUdGsSSWk=.9a40c696-9c91-4729-916d-61965099e0ae@github.com> References: <-ig7Zj830qvQ91e_kbIRRfOn_8Pm23qxFOxUdGsSSWk=.9a40c696-9c91-4729-916d-61965099e0ae@github.com> Message-ID: On Thu, 2 May 2024 12:57:20 GMT, Aleksey Shipilev wrote: >> Ian Myers has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> [8324756] Remove dependency verification from vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java > > I think you want to add the reversal of https://github.com/openjdk/jdk/commit/2564f0f99866c33d14947609c276a421ce8cc0a2 to this PR as well. > > I am not sure we want to run the test with disabled dependency verification, though. It is a compiler test, so we would like to have compiler checking code online as much as possible. Have you explored if this is an issue with Sweeper removal, and if so, if adding GCs help? @shipilev I have experimented with adding a periodic GC (every 5 seconds) in a new thread, and it did not affect the run time of the test. It still timed out at `CONF=linux-x86_64-server-fastdebug make test 1371.53s user 14.98s system 112% cpu 20:31.41 total` with the removal of the `-XX:-VerifyDependencies` flag. I will submit an amended commit with this test removed from the ProblemList.txt. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19040#issuecomment-2102649525 From fyang at openjdk.org Thu May 9 13:24:54 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 9 May 2024 13:24:54 GMT Subject: RFR: 8331862: Remove split relocation info implementation [v3] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 19:34:10 GMT, Vladimir Kozlov wrote: >> [Split relocation info](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/relocInfo.hpp#L225) was used only for SPARC. Non of current OpenJDK platforms use it. >> >> Tested tier1-3,stress,xcomp. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > address comments Also performed `tier1` tests on linux-riscv64 platform. Result looks good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19126#issuecomment-2102653877 From mli at openjdk.org Thu May 9 14:09:52 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 9 May 2024 14:09:52 GMT Subject: RFR: 8331577: RISC-V: C2 CountLeadingZerosV [v2] In-Reply-To: References: <8PHpGLNxXgcc-oM9IAc9UnnJNCaF34NnEHFP2R2nSvs=.383399b1-e70e-460c-8efe-9d88ab6a34ba@github.com> <47txZsG98U3vKdhefoQGDYz5g6IPFFWWzQFI9P6pA0A=.1a396733-46e7-4eb5-9c56-d6293196056f@github.com> Message-ID: <8FNJwg59AJZc59jms3X0vBA2LG4d6oEexzqJUq7cT1A=.4bbd1d52-7cd5-437b-9e25-77e1d0e245c3@github.com> On Thu, 9 May 2024 13:18:13 GMT, Fei Yang wrote: > > > Sorry for not being accurate. In fact, I mean requirement at the Java level. Why should we keep the origin value of inactive elements from the input vector? I didn't notice such a requirement before. > > > > > > I'm not sure about other places, but in vector APi, if you do operations with a mask, then the untouched (inactive in riscv vector insts) elements should be unmodified, i.e. same as original values. > > It will be helpful if you could point to the specific code or examples. For the example usage, please check the test code, e.g. https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/Byte64VectorTests.java#L5458 For the courterpart of this intrinsic in arm, please check https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L6391 Hope these information are helpful. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19153#issuecomment-2102732804 From kvn at openjdk.org Thu May 9 15:29:58 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 9 May 2024 15:29:58 GMT Subject: RFR: 8331862: Remove split relocation info implementation [v3] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 19:34:10 GMT, Vladimir Kozlov wrote: >> [Split relocation info](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/relocInfo.hpp#L225) was used only for SPARC. Non of current OpenJDK platforms use it. >> >> Tested tier1-3,stress,xcomp. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > address comments Thank you, Amit and Fei, for testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19126#issuecomment-2102885628 From duke at openjdk.org Thu May 9 16:56:24 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Thu, 9 May 2024 16:56:24 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v17] In-Reply-To: References: Message-ID: <3nP3cGJZXnHXo2XZDKxZGj1aNIsKW8D1lQUl_nNwDuQ=.1404a4f7-6213-4f3b-a975-291202849538@github.com> > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: enable EEVEX encoding of vex map2 instructions when UseAVX=2 if UseAPX=true ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/52628798..d4ecb31c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=15-16 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From duke at openjdk.org Thu May 9 16:56:24 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Thu, 9 May 2024 16:56:24 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v15] In-Reply-To: References: Message-ID: <44Vx5Qdjf_VFHK0outx5pqCShAaEWqxz_vKurFE0WgE=.4b2bd04a-9422-4e9d-9f10-26fb59e8cba0@github.com> On Thu, 9 May 2024 11:21:28 GMT, Jatin Bhateja wrote: >> Thanks @jatin-bhateja . Do you mean a check such as: >> >> `if ((UseAVX > 2 || UseAPX) && !attributes->is_legacy_mode())` ? > >> UseAPX > > Yes, attaching a test depicting incorrectness with UseAVX=2 for SHLX which is a legacy map 2 instruction promotable to extended EVEX with EGPR operands. > [shift_left_APX.txt](https://github.com/openjdk/jdk/files/15261495/shift_left_APX.txt) > > It will not be appropriate to modify VM_Version::supports_evex for APX feature since its used for constraining dynamic register classes associated with vector operands. Ok, thanks. I've added the above change to the conditionals in the vex_prefix and vex_prefix_and_encode functions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1595713834 From jbhateja at openjdk.org Thu May 9 19:34:57 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 9 May 2024 19:34:57 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v17] In-Reply-To: <3nP3cGJZXnHXo2XZDKxZGj1aNIsKW8D1lQUl_nNwDuQ=.1404a4f7-6213-4f3b-a975-291202849538@github.com> References: <3nP3cGJZXnHXo2XZDKxZGj1aNIsKW8D1lQUl_nNwDuQ=.1404a4f7-6213-4f3b-a975-291202849538@github.com> Message-ID: <9Q2ix7wJTkRyivF1JND_cjcoI6vn1O2przOrcnpJBXM=.8fca1722-ec94-4066-ab8d-8f5f5673e430@github.com> On Thu, 9 May 2024 16:56:24 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > enable EEVEX encoding of vex map2 instructions when UseAVX=2 if UseAPX=true src/hotspot/cpu/x86/assembler_x86.cpp line 4914: > 4912: assert(VM_Version::supports_sse4_1(), ""); > 4913: InstructionAttr attributes(AVX_128bit, /* rex_w */ true, /* legacy_mode */ _legacy_mode_dq, /* no_mask_reg */ true, /* uses_vl */ false); > 4914: int encode = simd_prefix_and_encode(dst, dst, as_XMMRegister(src->encoding()), VEX_SIMD_66, VEX_OPCODE_0F_3A, &attributes, true); _legacy_mode_dq and _legacy_mode_bw will be true for non AVX512DQ/BW targets, this will cause incorrectness since our scheme has been to treat those as non-legacy instructions upfront and only perform legacy demotions in leaf level routines if non of the register operand is an EGPR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1595886429 From duke at openjdk.org Thu May 9 21:47:35 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Thu, 9 May 2024 21:47:35 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v17] In-Reply-To: <9Q2ix7wJTkRyivF1JND_cjcoI6vn1O2przOrcnpJBXM=.8fca1722-ec94-4066-ab8d-8f5f5673e430@github.com> References: <3nP3cGJZXnHXo2XZDKxZGj1aNIsKW8D1lQUl_nNwDuQ=.1404a4f7-6213-4f3b-a975-291202849538@github.com> <9Q2ix7wJTkRyivF1JND_cjcoI6vn1O2przOrcnpJBXM=.8fca1722-ec94-4066-ab8d-8f5f5673e430@github.com> Message-ID: On Thu, 9 May 2024 19:32:24 GMT, Jatin Bhateja wrote: >> Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: >> >> enable EEVEX encoding of vex map2 instructions when UseAVX=2 if UseAPX=true > > src/hotspot/cpu/x86/assembler_x86.cpp line 4914: > >> 4912: assert(VM_Version::supports_sse4_1(), ""); >> 4913: InstructionAttr attributes(AVX_128bit, /* rex_w */ true, /* legacy_mode */ _legacy_mode_dq, /* no_mask_reg */ true, /* uses_vl */ false); >> 4914: int encode = simd_prefix_and_encode(dst, dst, as_XMMRegister(src->encoding()), VEX_SIMD_66, VEX_OPCODE_0F_3A, &attributes, true); > > _legacy_mode_dq and _legacy_mode_bw will be true for non AVX512DQ/BW targets, this will cause incorrectness since our scheme has been to treat those as non-legacy instructions upfront and only perform legacy demotions in leaf level routines if non of the register operand is an EGPR. In general, the legacy mode will be set to true whenever UseAVX < 3, due to logic in the InstructionAttr ctor. `_legacy_mode(legacy_mode || UseAVX < 3` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1596017447 From mdoerr at openjdk.org Thu May 9 21:59:33 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 9 May 2024 21:59:33 GMT Subject: RFR: 8331862: Remove split relocation info implementation [v3] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 19:34:10 GMT, Vladimir Kozlov wrote: >> [Split relocation info](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/relocInfo.hpp#L225) was used only for SPARC. Non of current OpenJDK platforms use it. >> >> Tested tier1-3,stress,xcomp. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > address comments tier1 and many more tests have also passed on PPC64. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19126#issuecomment-2103466537 From kvn at openjdk.org Thu May 9 23:04:05 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 9 May 2024 23:04:05 GMT Subject: RFR: 8331862: Remove split relocation info implementation [v3] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 19:34:10 GMT, Vladimir Kozlov wrote: >> [Split relocation info](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/relocInfo.hpp#L225) was used only for SPARC. Non of current OpenJDK platforms use it. >> >> Tested tier1-3,stress,xcomp. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > address comments Thank you, Martin ------------- PR Comment: https://git.openjdk.org/jdk/pull/19126#issuecomment-2103575606 From kvn at openjdk.org Thu May 9 23:46:08 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 9 May 2024 23:46:08 GMT Subject: Integrated: 8331862: Remove split relocation info implementation In-Reply-To: References: Message-ID: On Tue, 7 May 2024 16:16:33 GMT, Vladimir Kozlov wrote: > [Split relocation info](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/relocInfo.hpp#L225) was used only for SPARC. Non of current OpenJDK platforms use it. > > Tested tier1-3,stress,xcomp. This pull request has now been integrated. Changeset: a643d6c7 Author: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/a643d6c7ac8a7bc0d3a288c1ef3f07876cf70590 Stats: 127 lines in 10 files changed: 2 ins; 65 del; 60 mod 8331862: Remove split relocation info implementation Reviewed-by: dlong ------------- PR: https://git.openjdk.org/jdk/pull/19126 From jwaters at openjdk.org Fri May 10 01:00:04 2024 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 10 May 2024 01:00:04 GMT Subject: RFR: 8331908: Simplify log code in vectorintrinsics.cpp In-Reply-To: References: Message-ID: <7baXLapkFPMESg7GfO26_rP-ADGub_eN6TfTOx6Th2c=.0f17c78e-7b01-423b-bf41-47b47ebc2b7c@github.com> On Wed, 8 May 2024 08:41:31 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Curretly, log code in vectorintrinsics.cpp is a bit redundant, could be simplified a bit. > Thanks. > > ## Test > sanity test, jdk/incubator/vector Marked as reviewed by jwaters (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19135#pullrequestreview-2049066571 From kvn at openjdk.org Fri May 10 01:32:17 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 10 May 2024 01:32:17 GMT Subject: RFR: 8331908: Simplify log code in vectorintrinsics.cpp In-Reply-To: References: Message-ID: On Wed, 8 May 2024 08:41:31 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Curretly, log code in vectorintrinsics.cpp is a bit redundant, could be simplified a bit. > Thanks. > > ## Test > sanity test, jdk/incubator/vector Good. I will run our testing before approval. ------------- PR Review: https://git.openjdk.org/jdk/pull/19135#pullrequestreview-2049097766 From kvn at openjdk.org Fri May 10 02:34:02 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 10 May 2024 02:34:02 GMT Subject: RFR: 8331908: Simplify log code in vectorintrinsics.cpp In-Reply-To: References: Message-ID: On Wed, 8 May 2024 08:41:31 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Curretly, log code in vectorintrinsics.cpp is a bit redundant, could be simplified a bit. > Thanks. > > ## Test > sanity test, jdk/incubator/vector My testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19135#pullrequestreview-2049144736 From jbhateja at openjdk.org Fri May 10 05:10:07 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 10 May 2024 05:10:07 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v17] In-Reply-To: References: <3nP3cGJZXnHXo2XZDKxZGj1aNIsKW8D1lQUl_nNwDuQ=.1404a4f7-6213-4f3b-a975-291202849538@github.com> <9Q2ix7wJTkRyivF1JND_cjcoI6vn1O2przOrcnpJBXM=.8fca1722-ec94-4066-ab8d-8f5f5673e430@github.com> Message-ID: On Thu, 9 May 2024 21:42:34 GMT, Steve Dohrmann wrote: >> src/hotspot/cpu/x86/assembler_x86.cpp line 4914: >> >>> 4912: assert(VM_Version::supports_sse4_1(), ""); >>> 4913: InstructionAttr attributes(AVX_128bit, /* rex_w */ true, /* legacy_mode */ _legacy_mode_dq, /* no_mask_reg */ true, /* uses_vl */ false); >>> 4914: int encode = simd_prefix_and_encode(dst, dst, as_XMMRegister(src->encoding()), VEX_SIMD_66, VEX_OPCODE_0F_3A, &attributes, true); >> >> _legacy_mode_dq and _legacy_mode_bw will be true for non AVX512DQ/BW targets, this will cause incorrectness since our scheme has been to treat those as non-legacy instructions upfront and only perform legacy demotions in leaf level routines if non of the register operand is an EGPR. > > In general, the legacy mode will be set to true whenever UseAVX < 3, due to logic in the InstructionAttr ctor. > > `_legacy_mode(legacy_mode || UseAVX < 3` PFA a test point depicting the problem. [insertQ_map3_eevex.txt](https://github.com/openjdk/jdk/files/15270533/insertQ_map3_eevex.txt) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1596259907 From jbhateja at openjdk.org Fri May 10 05:26:07 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 10 May 2024 05:26:07 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v17] In-Reply-To: References: <3nP3cGJZXnHXo2XZDKxZGj1aNIsKW8D1lQUl_nNwDuQ=.1404a4f7-6213-4f3b-a975-291202849538@github.com> <9Q2ix7wJTkRyivF1JND_cjcoI6vn1O2przOrcnpJBXM=.8fca1722-ec94-4066-ab8d-8f5f5673e430@github.com> Message-ID: On Fri, 10 May 2024 05:07:21 GMT, Jatin Bhateja wrote: >> In general, the legacy mode will be set to true whenever UseAVX < 3, due to logic in the InstructionAttr ctor. >> >> `_legacy_mode(legacy_mode || UseAVX < 3` > > PFA a test point depicting the problem. > [insertQ_map3_eevex.txt](https://github.com/openjdk/jdk/files/15270533/insertQ_map3_eevex.txt) For previously attached test point, we see illegal instruction encoding with UseAVX=0 Illegal instruction at address = 7f147a64af08: 66 d5 18 0f 3a 22 c0 01 f3 0f 7f 46 10 d5 10 Image name: not from an image If you believe your application should attempt to execute this illegal instruction (and others that may be present), Then use this knob: -emit-illegal-insts 0 and this error message will be avoided. SDE ERROR: Illegal instruction at address = 7f147a64af08: 66 d5 18 0f 3a 22 c0 01 f3 0f 7f 46 10 d5 10 PINSRQ being a legacy MAP3 instruction should be promoted to Extended EVEX encoding, in this case an incorrect REX2 prefix is being emitted. `Command line: sde -dmr -- java -XX:-TieredCompilation -Xbatch --add-modules=jdk.incubator.vector -XX:UseAVX=0 -XX:CompileCommand=Print,insertQ::micro -cp . insertQ` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1596272481 From duke at openjdk.org Fri May 10 06:04:10 2024 From: duke at openjdk.org (duke) Date: Fri, 10 May 2024 06:04:10 GMT Subject: Withdrawn: 8315066: Add unsigned bounds and known bits to TypeInt/Long In-Reply-To: References: Message-ID: <50TYSexOVLaUyHAI7tCmZP7RtfCJ4xKi2i-joOCUI8M=.c701de97-4813-4f55-8f64-6811db0694a7@github.com> On Sat, 20 Jan 2024 19:23:23 GMT, Quan Anh Mai wrote: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/17508 From mli at openjdk.org Fri May 10 06:28:07 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 10 May 2024 06:28:07 GMT Subject: RFR: 8331908: Simplify log code in vectorintrinsics.cpp In-Reply-To: References: Message-ID: On Fri, 10 May 2024 02:31:13 GMT, Vladimir Kozlov wrote: >> Hi, >> Can you help to review this simple patch? >> Curretly, log code in vectorintrinsics.cpp is a bit redundant, could be simplified a bit. >> Thanks. >> >> ## Test >> sanity test, jdk/incubator/vector > > My testing passed. Thanks @vnkozlov @TheShermanTanker for your reviewing and testing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19135#issuecomment-2103949861 From mli at openjdk.org Fri May 10 06:28:08 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 10 May 2024 06:28:08 GMT Subject: Integrated: 8331908: Simplify log code in vectorintrinsics.cpp In-Reply-To: References: Message-ID: On Wed, 8 May 2024 08:41:31 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Curretly, log code in vectorintrinsics.cpp is a bit redundant, could be simplified a bit. > Thanks. > > ## Test > sanity test, jdk/incubator/vector This pull request has now been integrated. Changeset: f47fc867 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/f47fc867b3518cb285d39f7b157bf7fde87b2083 Stats: 497 lines in 1 file changed: 14 ins; 322 del; 161 mod 8331908: Simplify log code in vectorintrinsics.cpp Reviewed-by: jwaters, kvn ------------- PR: https://git.openjdk.org/jdk/pull/19135 From fyang at openjdk.org Fri May 10 06:33:03 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 10 May 2024 06:33:03 GMT Subject: RFR: 8331577: RISC-V: C2 CountLeadingZerosV [v2] In-Reply-To: <8FNJwg59AJZc59jms3X0vBA2LG4d6oEexzqJUq7cT1A=.4bbd1d52-7cd5-437b-9e25-77e1d0e245c3@github.com> References: <8PHpGLNxXgcc-oM9IAc9UnnJNCaF34NnEHFP2R2nSvs=.383399b1-e70e-460c-8efe-9d88ab6a34ba@github.com> <47txZsG98U3vKdhefoQGDYz5g6IPFFWWzQFI9P6pA0A=.1a396733-46e7-4eb5-9c56-d6293196056f@github.com> <8FNJwg59AJZc59jms3X0vBA2LG4d6oEexzqJUq7cT1A=.4bbd1d52-7cd5-437b-9e25-77e1d0e245c3@github.com> Message-ID: On Thu, 9 May 2024 14:07:00 GMT, Hamlin Li wrote: > > > > Sorry for not being accurate. In fact, I mean requirement at the Java level. Why should we keep the origin value of inactive elements from the input vector? I didn't notice such a requirement before. > > > > > > > > > I'm not sure about other places, but in vector APi, if you do operations with a mask, then the untouched (inactive in riscv vector insts) elements should be unmodified, i.e. same as original values. > > > > > > It will be helpful if you could point to the specific code or examples. > > For the example usage, please check the test code, e.g. https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/Byte64VectorTests.java#L5458 For the courterpart of this intrinsic in arm, please check https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L6391 Hope these information are helpful. Yeah, I think you are right. This is also reflected in the vector api source code like [1] [2]. [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java#L184 [2] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java#L231 ------------- PR Comment: https://git.openjdk.org/jdk/pull/19153#issuecomment-2103955774 From mli at openjdk.org Fri May 10 07:10:15 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 10 May 2024 07:10:15 GMT Subject: RFR: 8331577: RISC-V: C2 CountLeadingZerosV [v3] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch adding CountLeadingZerosV and CountTrailingZerosV instrinsics? > Thanks. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: fix masked ReverseBytesV & PopCountV by sharing dst&src regs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19153/files - new: https://git.openjdk.org/jdk/pull/19153/files/1d5d17fe..0aaa0834 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19153&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19153&range=01-02 Stats: 9 lines in 1 file changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/19153.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19153/head:pull/19153 PR: https://git.openjdk.org/jdk/pull/19153 From fyang at openjdk.org Fri May 10 07:21:03 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 10 May 2024 07:21:03 GMT Subject: RFR: 8331577: RISC-V: C2 CountLeadingZerosV [v3] In-Reply-To: References: Message-ID: On Fri, 10 May 2024 07:10:15 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch adding CountLeadingZerosV and CountTrailingZerosV instrinsics? >> Thanks. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > fix masked ReverseBytesV & PopCountV by sharing dst&src regs src/hotspot/cpu/riscv/riscv_v.ad line 3766: > 3764: instruct vreverse_bytes_masked(vReg dst_src, vRegMask_V0 v0) %{ > 3765: match(Set dst_src (ReverseBytesV dst_src v0)); > 3766: format %{ "vreverse_bytes_masked $dst_src, $dst_src, v0" %} Nit: I think we can use something more accurate like `vrev8.v` as the opcode name in format. That will be consistent with the RVV spec. Also I suggest `v0.t` instead of `v0` or `$v0` as the mask for predicated instructs (Might deserve a separate PR for cleaning up other existing predicated instructs). Similar for other newly added instructs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19153#discussion_r1596355741 From mli at openjdk.org Fri May 10 07:39:05 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 10 May 2024 07:39:05 GMT Subject: RFR: 8331577: RISC-V: C2 CountLeadingZerosV [v3] In-Reply-To: References: Message-ID: On Fri, 10 May 2024 07:10:36 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> fix masked ReverseBytesV & PopCountV by sharing dst&src regs > > src/hotspot/cpu/riscv/riscv_v.ad line 3766: > >> 3764: instruct vreverse_bytes_masked(vReg dst_src, vRegMask_V0 v0) %{ >> 3765: match(Set dst_src (ReverseBytesV dst_src v0)); >> 3766: format %{ "vreverse_bytes_masked $dst_src, $dst_src, v0" %} > > Nit: I think we can use something more accurate like `vrev8.v` as the opcode name in format. That will be consistent with the RVV spec. Also I suggest `v0.t` instead of `v0` or `$v0` as the mask for predicated instructs (Might deserve a separate PR for cleaning up other existing predicated instructs). Similar for other newly added instructs. Sure, let me do it later, tracked by https://bugs.openjdk.org/browse/JDK-8332030. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19153#discussion_r1596384248 From fyang at openjdk.org Fri May 10 08:22:02 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 10 May 2024 08:22:02 GMT Subject: RFR: 8331577: RISC-V: C2 CountLeadingZerosV [v3] In-Reply-To: References: Message-ID: On Fri, 10 May 2024 07:10:15 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch adding CountLeadingZerosV and CountTrailingZerosV instrinsics? >> Thanks. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > fix masked ReverseBytesV & PopCountV by sharing dst&src regs Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19153#pullrequestreview-2049557417 From chagedorn at openjdk.org Fri May 10 09:20:20 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 10 May 2024 09:20:20 GMT Subject: RFR: 8331764: C2 SuperWord: refactor _align_to_ref/_mem_ref_for_main_loop_alignment In-Reply-To: References: Message-ID: <8GxZQOQcBkihzzemSKUg3umrWvN3-qH16jxlSoKWoe8=.d537bffd-7cf4-4a20-887f-316e066387ca@github.com> On Tue, 7 May 2024 09:26:11 GMT, Emanuel Peter wrote: > This PR accomplishes these things: > - Rename `_align_to_ref` -> `_mem_ref_for_main_loop_alignment`. > - Move the `mem_ref` finding for alignment out of `SuperWord::find_adjacent_refs`. This is too early, and we don't even know if the relevant `mem_ref` is going to be vectorized. It makes more sense to pick a `mem_ref` directly in `SuperWord::adjust_pre_loop_limit_to_align_main_loop_vectors`, where we already know what packs are going to be vectorized. > - For the alignment width (aw), we can use the `vector_width` of the pack to which the `mem_ref` belongs, rather than the potentially much larger `vector_width_in_bytes`. I track this with `_aw_for_main_loop_alignment` now. > > I need this for https://github.com/openjdk/jdk/pull/18822, and decided to split it out into an independent change. Looks good to me, too! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19115#pullrequestreview-2049677165 From chagedorn at openjdk.org Fri May 10 09:27:14 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 10 May 2024 09:27:14 GMT Subject: RFR: 8330584: IGV: XML does not save all node properties [v2] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 12:10:23 GMT, Tobias Holenstein wrote: >> When C2 sends graphs over the network to IGV, each graph is sent separately. The same applies if C2 saves graphs to XML: each graph is saved with all it's nodes as a separate `...` in the XML >> >> To save space, graphs that are saved from IGV only contains the incremental difference for each graph. This saves a lot of space (~5-10x). The logic happens in Printer.java -> `exportInputGraph(.., difference=true, ...)` Unfortunately, there is a bug in this logic: the properties of the nodes are not saved correctly. >> >> [graphs.zip](https://github.com/openjdk/jdk/files/15220940/graphs.zip) contains 4 graphs: >> >> `graph_c2.xml` (230KB) - a XML saved from C2 >> `graph_igv_bug.xml` (73KB) - opened `graph_c2.xml` in IGV (without this fix) and save as `graph_igv_bug.xml`. >> `graph_igv_fixed.xml` (123KB) - opened `graph_c2.xml` in IGV (with this fix) and save as `graph_igv_fixed.xml `. >> >> As you can see `graph_igv_fixed.xml` is twice as large as `graph_igv_bug.xml` because it contains the missing properties. But now the memory saving from the original `graph_c2.xml` is only ~2x. >> Therefore a new format for saving is added: graphs can now be saved and opened from IGV as `.igv`. This uses a compressed (ZIP) format. >> >> `graph.igv` (10KB) is the same graph as `graph_c2.xml` (230KB). But it uses difference graph compression and ZIP compression and is in total 23x smaller in memory footprint. >> >> >> >> E.g. The root in the last graph of difference_true.xml has way less properties than in difference_false.xml. > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > Update src/utils/IdealGraphVisualizer/Coordinator/src/main/java/com/sun/hotspot/igv/coordinator/OutlineTopComponent.java > > Co-authored-by: Roberto Casta?eda Lozano Looks good to me, too! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19104#pullrequestreview-2049689287 From chagedorn at openjdk.org Fri May 10 09:40:31 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 10 May 2024 09:40:31 GMT Subject: RFR: 8330584: IGV: XML does not save all node properties [v2] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 12:10:23 GMT, Tobias Holenstein wrote: >> When C2 sends graphs over the network to IGV, each graph is sent separately. The same applies if C2 saves graphs to XML: each graph is saved with all it's nodes as a separate `...` in the XML >> >> To save space, graphs that are saved from IGV only contains the incremental difference for each graph. This saves a lot of space (~5-10x). The logic happens in Printer.java -> `exportInputGraph(.., difference=true, ...)` Unfortunately, there is a bug in this logic: the properties of the nodes are not saved correctly. >> >> [graphs.zip](https://github.com/openjdk/jdk/files/15220940/graphs.zip) contains 4 graphs: >> >> `graph_c2.xml` (230KB) - a XML saved from C2 >> `graph_igv_bug.xml` (73KB) - opened `graph_c2.xml` in IGV (without this fix) and save as `graph_igv_bug.xml`. >> `graph_igv_fixed.xml` (123KB) - opened `graph_c2.xml` in IGV (with this fix) and save as `graph_igv_fixed.xml `. >> >> As you can see `graph_igv_fixed.xml` is twice as large as `graph_igv_bug.xml` because it contains the missing properties. But now the memory saving from the original `graph_c2.xml` is only ~2x. >> Therefore a new format for saving is added: graphs can now be saved and opened from IGV as `.igv`. This uses a compressed (ZIP) format. >> >> `graph.igv` (10KB) is the same graph as `graph_c2.xml` (230KB). But it uses difference graph compression and ZIP compression and is in total 23x smaller in memory footprint. >> >> >> >> E.g. The root in the last graph of difference_true.xml has way less properties than in difference_false.xml. > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > Update src/utils/IdealGraphVisualizer/Coordinator/src/main/java/com/sun/hotspot/igv/coordinator/OutlineTopComponent.java > > Co-authored-by: Roberto Casta?eda Lozano Just a general thought: Should we generally only save in `.igv` format and drop saving in XML format or is there any benefit to be able to store in both formats? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19104#issuecomment-2104287392 From chagedorn at openjdk.org Fri May 10 09:49:28 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 10 May 2024 09:49:28 GMT Subject: RFR: 8331993: Add counting leading/trailing zero tests for Integer In-Reply-To: <7a7fXkgF6v-sSFHCk-GT0DbHr9t8AO7bGh1X1JaF-gg=.19a655eb-cb68-446c-8207-270a2ee87492@github.com> References: <7a7fXkgF6v-sSFHCk-GT0DbHr9t8AO7bGh1X1JaF-gg=.19a655eb-cb68-446c-8207-270a2ee87492@github.com> Message-ID: On Thu, 9 May 2024 11:09:39 GMT, Hamlin Li wrote: > Hi, > Can you help to review the patch adding some test? > Currently, in hotspot/jtreg/compiler/vectorization/TestNumberOfContinuousZeros.java, there is only tests for Long, not for Integer. > Thanks. Otherwise, looks good! test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 1177: > 1175: } > 1176: > 1177: public static final String COUNTLEADINGZEROS_VI = VECTOR_PREFIX + "COUNTLEADINGZEROS_VI" + POSTFIX; Would have been better to add `_` like that: `COUNT_LEADING_ZEROS_VI` But the existing `IRNode` strings for the long versions already miss that. If you want to also fix this here, feel free to do so. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19154#pullrequestreview-2049723864 PR Review Comment: https://git.openjdk.org/jdk/pull/19154#discussion_r1596535655 From rcastanedalo at openjdk.org Fri May 10 09:51:07 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 10 May 2024 09:51:07 GMT Subject: RFR: 8330584: IGV: XML does not save all node properties [v2] In-Reply-To: References: Message-ID: <6Jqe1Ue2PTI-xcu4MyTlQLs6S5T_tK8dJC8RjY3aXBs=.d6adb162-e5b6-46a3-b6ce-65d2a9b3a3db@github.com> On Fri, 10 May 2024 09:37:44 GMT, Christian Hagedorn wrote: > Just a general thought: Should we generally only save in .igv format and drop (explicit) saving in XML format or is there any benefit to be able to store in both formats? I find the explicit XML format convenient sometimes for debugging something or doing a quick plain-text search. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19104#issuecomment-2104303354 From duke at openjdk.org Fri May 10 11:15:18 2024 From: duke at openjdk.org (Yuri Gaevsky) Date: Fri, 10 May 2024 11:15:18 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v2] In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 14:47:47 GMT, Yuri Gaevsky wrote: >> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. >> >> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. > > Yuri Gaevsky has updated the pull request incrementally with two additional commits since the last revision: > > - num_8b_elems_in_vec --> nof_vec_elems > - Removed checks for (MaxVectorSize >= 16) per @RealFYang suggestion. . ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-2104423343 From yzheng at openjdk.org Fri May 10 13:13:26 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Fri, 10 May 2024 13:13:26 GMT Subject: RFR: 8331429: [JVMCI] Cleanup JVMCIRuntime allocation routines Message-ID: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> This PR removes allocation routines that may throw exception from JVMCIRuntime. It also exports various symbols related to the hashed secondary supers table. ------------- Commit messages: - [JVMCI] Cleanup JVMCIRuntime allocation routines Changes: https://git.openjdk.org/jdk/pull/19176/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19176&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331429 Stats: 99 lines in 3 files changed: 3 ins; 41 del; 55 mod Patch: https://git.openjdk.org/jdk/pull/19176.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19176/head:pull/19176 PR: https://git.openjdk.org/jdk/pull/19176 From mli at openjdk.org Fri May 10 14:04:01 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 10 May 2024 14:04:01 GMT Subject: RFR: 8331577: RISC-V: C2 CountLeadingZerosV [v2] In-Reply-To: References: <8PHpGLNxXgcc-oM9IAc9UnnJNCaF34NnEHFP2R2nSvs=.383399b1-e70e-460c-8efe-9d88ab6a34ba@github.com> <47txZsG98U3vKdhefoQGDYz5g6IPFFWWzQFI9P6pA0A=.1a396733-46e7-4eb5-9c56-d6293196056f@github.com> <8FNJwg59AJZc59jms3X0vBA2LG4d6oEexzqJUq7cT1A=.4bbd1d52-7cd5-437b-9e25-77e1d0e245c3@github.com> Message-ID: On Fri, 10 May 2024 06:30:54 GMT, Fei Yang wrote: >>> > > Sorry for not being accurate. In fact, I mean requirement at the Java level. Why should we keep the origin value of inactive elements from the input vector? I didn't notice such a requirement before. >>> > >>> > >>> > I'm not sure about other places, but in vector APi, if you do operations with a mask, then the untouched (inactive in riscv vector insts) elements should be unmodified, i.e. same as original values. >>> >>> It will be helpful if you could point to the specific code or examples. >> >> For the example usage, please check the test code, e.g. https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/Byte64VectorTests.java#L5458 >> For the courterpart of this intrinsic in arm, please check https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L6391 >> Hope these information are helpful. > >> > > > Sorry for not being accurate. In fact, I mean requirement at the Java level. Why should we keep the origin value of inactive elements from the input vector? I didn't notice such a requirement before. >> > > >> > > >> > > I'm not sure about other places, but in vector APi, if you do operations with a mask, then the untouched (inactive in riscv vector insts) elements should be unmodified, i.e. same as original values. >> > >> > >> > It will be helpful if you could point to the specific code or examples. >> >> For the example usage, please check the test code, e.g. https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/Byte64VectorTests.java#L5458 For the courterpart of this intrinsic in arm, please check https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L6391 Hope these information are helpful. > > Yeah, I think you are right. This is also reflected in the vector api source code like Unary & Binary operator [1] [2]. > > [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java#L184 > [2] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java#L231 Thanks @RealFYang for your reviewing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19153#issuecomment-2104660027 From mli at openjdk.org Fri May 10 14:04:02 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 10 May 2024 14:04:02 GMT Subject: Integrated: 8331577: RISC-V: C2 CountLeadingZerosV In-Reply-To: References: Message-ID: On Thu, 9 May 2024 08:41:05 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch adding CountLeadingZerosV and CountTrailingZerosV instrinsics? > Thanks. This pull request has now been integrated. Changeset: f95c9374 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/f95c93740538e5e508407ec6750ed9f126fdc3c3 Stats: 72 lines in 3 files changed: 62 ins; 0 del; 10 mod 8331577: RISC-V: C2 CountLeadingZerosV 8331578: RISC-V: C2 CountTrailingZerosV Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/19153 From mli at openjdk.org Fri May 10 14:04:37 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 10 May 2024 14:04:37 GMT Subject: RFR: 8331993: Add counting leading/trailing zero tests for Integer [v2] In-Reply-To: <7a7fXkgF6v-sSFHCk-GT0DbHr9t8AO7bGh1X1JaF-gg=.19a655eb-cb68-446c-8207-270a2ee87492@github.com> References: <7a7fXkgF6v-sSFHCk-GT0DbHr9t8AO7bGh1X1JaF-gg=.19a655eb-cb68-446c-8207-270a2ee87492@github.com> Message-ID: > Hi, > Can you help to review the patch adding some test? > Currently, in hotspot/jtreg/compiler/vectorization/TestNumberOfContinuousZeros.java, there is only tests for Long, not for Integer. > Thanks. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: rename ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19154/files - new: https://git.openjdk.org/jdk/pull/19154/files/c0aaa35d..c8a543d8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19154&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19154&range=00-01 Stats: 14 lines in 3 files changed: 0 ins; 0 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/19154.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19154/head:pull/19154 PR: https://git.openjdk.org/jdk/pull/19154 From mli at openjdk.org Fri May 10 14:04:37 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 10 May 2024 14:04:37 GMT Subject: RFR: 8331993: Add counting leading/trailing zero tests for Integer [v2] In-Reply-To: References: <7a7fXkgF6v-sSFHCk-GT0DbHr9t8AO7bGh1X1JaF-gg=.19a655eb-cb68-446c-8207-270a2ee87492@github.com> Message-ID: On Fri, 10 May 2024 09:45:18 GMT, Christian Hagedorn wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> rename > > test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 1177: > >> 1175: } >> 1176: >> 1177: public static final String COUNTLEADINGZEROS_VI = VECTOR_PREFIX + "COUNTLEADINGZEROS_VI" + POSTFIX; > > Would have been better to add `_` like that: `COUNT_LEADING_ZEROS_VI` > > But the existing `IRNode` strings for the long versions already miss that. If you want to also fix this here, feel free to do so. Yes, it's more readable. Fixed. Thanks for your reviewing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19154#discussion_r1596788370 From mli at openjdk.org Fri May 10 14:04:37 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 10 May 2024 14:04:37 GMT Subject: Integrated: 8331993: Add counting leading/trailing zero tests for Integer In-Reply-To: <7a7fXkgF6v-sSFHCk-GT0DbHr9t8AO7bGh1X1JaF-gg=.19a655eb-cb68-446c-8207-270a2ee87492@github.com> References: <7a7fXkgF6v-sSFHCk-GT0DbHr9t8AO7bGh1X1JaF-gg=.19a655eb-cb68-446c-8207-270a2ee87492@github.com> Message-ID: On Thu, 9 May 2024 11:09:39 GMT, Hamlin Li wrote: > Hi, > Can you help to review the patch adding some test? > Currently, in hotspot/jtreg/compiler/vectorization/TestNumberOfContinuousZeros.java, there is only tests for Long, not for Integer. > Thanks. This pull request has now been integrated. Changeset: 675fbe69 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/675fbe699ed1aad37f34429cbe1f4f3e029be03f Stats: 67 lines in 3 files changed: 44 ins; 0 del; 23 mod 8331993: Add counting leading/trailing zero tests for Integer Reviewed-by: chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/19154 From aph at openjdk.org Fri May 10 14:28:59 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 10 May 2024 14:28:59 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v7] In-Reply-To: References: Message-ID: > At the present time, `assert_different_registers()` uses an O(N**2) algorithm in assert_different_registers(). We can utilize RegSet to do it in O(N) time. This would be a useful optimization for all builds with assertions enabled. > > In addition, it would be useful to be able to static_assert different registers. > > Also, I've taken the opportunity to expand the maximum size of a RegSet to 64 on 64-bit platforms. > > I also fixed a bug: sometimes `noreg` is passed to `assert_different_registers()`, but it may only be passed once or a spurious assertion is triggered. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/asm/register.hpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16617/files - new: https://git.openjdk.org/jdk/pull/16617/files/a945d094..36f48ad0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16617&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16617&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16617.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16617/head:pull/16617 PR: https://git.openjdk.org/jdk/pull/16617 From aph at openjdk.org Fri May 10 14:29:01 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 10 May 2024 14:29:01 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v6] In-Reply-To: References: Message-ID: <3xKYuTm22oA-SeoXK20LuPypVkTVuQNM7C9kY_tKlgs=.04a0c1cf-691d-428c-9c12-78bc02cab6d0@github.com> On Wed, 17 Jan 2024 07:32:44 GMT, Kim Barrett wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Review feedback. > > src/hotspot/share/asm/register.hpp line 257: > >> 255: >> 256: template >> 257: inline constexpr bool different_registers(AbstractRegSet allocated_regs, R first_register) { > > different_registers is only used by debug-only code in assert_different_registers. Shouldn't all the overloads > for different_registers be within an `#ifdef ASSERT` block? I could do so, but that would lose the ability to do `static_assert(different_registers(...`. I don't think that `static_assert` depends on `ASSERT`. I'm happy to make this patch debug-only, though, if you prefer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16617#discussion_r1596823414 From dnsimon at openjdk.org Fri May 10 14:30:31 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 10 May 2024 14:30:31 GMT Subject: RFR: 8331429: [JVMCI] Cleanup JVMCIRuntime allocation routines In-Reply-To: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> References: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> Message-ID: <4CEZIVjNESdAI-WNOY0akY7kd8DXgTQRy2fm1NHO-G8=.187d70c5-41a3-4af0-b6fb-06731da7403f@github.com> On Fri, 10 May 2024 13:06:21 GMT, Yudi Zheng wrote: > This PR removes allocation routines that may throw exception from JVMCIRuntime. It also exports various symbols related to the hashed secondary supers table. Please also update `InternalOOMEMark` to remove support for `thread` being `nullptr`. src/hotspot/share/jvmci/jvmciRuntime.hpp line 509: > 507: // The following routines are called from compiled JVMCI code > 508: > 509: // When allocation fails, these stubs return null and have no pending exception. Compiled code "and have no pending OutOfMemoryError exception" It's still possible for an async exception to be pending. For Graal, that's ok as it unconditional clears any pending exception when calling these stubs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19176#issuecomment-2104705327 PR Review Comment: https://git.openjdk.org/jdk/pull/19176#discussion_r1596830028 From aph at openjdk.org Fri May 10 14:58:54 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 10 May 2024 14:58:54 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v8] In-Reply-To: References: Message-ID: > At the present time, `assert_different_registers()` uses an O(N**2) algorithm in assert_different_registers(). We can utilize RegSet to do it in O(N) time. This would be a useful optimization for all builds with assertions enabled. > > In addition, it would be useful to be able to static_assert different registers. > > Also, I've taken the opportunity to expand the maximum size of a RegSet to 64 on 64-bit platforms. > > I also fixed a bug: sometimes `noreg` is passed to `assert_different_registers()`, but it may only be passed once or a spurious assertion is triggered. Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: - Review feedback - Merge branch 'different-regs' of https://github.com/theRealAph/jdk into different-regs - Update src/hotspot/share/asm/register.hpp Co-authored-by: Emanuel Peter - Merge branch 'clean' into different-regs - Review feedback. - 8319822: Use a linear-time algorithm for assert_different_registers() - 8319822: Use a linear-time algorithm for assert_different_registers() - Cleanup, fix warning on Windows. - Fix x86 - Bleurgh - ... and 3 more: https://git.openjdk.org/jdk/compare/211fe58c...0037dd29 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16617/files - new: https://git.openjdk.org/jdk/pull/16617/files/36f48ad0..0037dd29 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16617&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16617&range=06-07 Stats: 1504428 lines in 12564 files changed: 341226 ins; 719204 del; 443998 mod Patch: https://git.openjdk.org/jdk/pull/16617.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16617/head:pull/16617 PR: https://git.openjdk.org/jdk/pull/16617 From aph at openjdk.org Fri May 10 14:58:54 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 10 May 2024 14:58:54 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v7] In-Reply-To: References: Message-ID: On Fri, 10 May 2024 14:28:59 GMT, Andrew Haley wrote: >> At the present time, `assert_different_registers()` uses an O(N**2) algorithm in assert_different_registers(). We can utilize RegSet to do it in O(N) time. This would be a useful optimization for all builds with assertions enabled. >> >> In addition, it would be useful to be able to static_assert different registers. >> >> Also, I've taken the opportunity to expand the maximum size of a RegSet to 64 on 64-bit platforms. >> >> I also fixed a bug: sometimes `noreg` is passed to `assert_different_registers()`, but it may only be passed once or a spurious assertion is triggered. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/asm/register.hpp > > Co-authored-by: Emanuel Peter > I started to review the patch and was wondering if this could be simplify to something like this?: [stefank at f38c791](https://github.com/stefank/jdk/commit/f38c791793440b899ce6c4c9723470a5d4b18050) > > I tested this with this small section of temporary static_asserts: [stefank at 30da4d6](https://github.com/stefank/jdk/commit/30da4d6abeee14e4e4f44034295f1bb0ad2e3016) > > Unfortunately, that didn't compile and I had make this change to get it to work: [stefank at d6bda1a](https://github.com/stefank/jdk/commit/d6bda1a25e297865fd6b5da21184273d8825b922) OK, so I'm not going to do that, then. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16617#issuecomment-2104755838 From aph at openjdk.org Fri May 10 15:05:23 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 10 May 2024 15:05:23 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v9] In-Reply-To: References: Message-ID: > At the present time, `assert_different_registers()` uses an O(N**2) algorithm in assert_different_registers(). We can utilize RegSet to do it in O(N) time. This would be a useful optimization for all builds with assertions enabled. > > In addition, it would be useful to be able to static_assert different registers. > > Also, I've taken the opportunity to expand the maximum size of a RegSet to 64 on 64-bit platforms. > > I also fixed a bug: sometimes `noreg` is passed to `assert_different_registers()`, but it may only be passed once or a spurious assertion is triggered. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Review feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16617/files - new: https://git.openjdk.org/jdk/pull/16617/files/0037dd29..857152f6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16617&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16617&range=07-08 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16617.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16617/head:pull/16617 PR: https://git.openjdk.org/jdk/pull/16617 From aph at openjdk.org Fri May 10 15:05:23 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 10 May 2024 15:05:23 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v6] In-Reply-To: References: Message-ID: On Wed, 17 Jan 2024 07:10:13 GMT, Kim Barrett wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Review feedback. > > src/hotspot/cpu/aarch64/register_aarch64.hpp line 73: > >> 71: >> 72: constexpr bool operator==(const Register r) const { return _encoding == r._encoding; } >> 73: constexpr bool operator!=(const Register r) const { return _encoding != r._encoding; } > > This seems unrelated to the rest of this change. It also seems like something that should be done for all > of the register_ variants. It was related to another reviewer's comments, but we don't need it ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16617#discussion_r1596871037 From aph at openjdk.org Fri May 10 15:26:29 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 10 May 2024 15:26:29 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v10] In-Reply-To: References: Message-ID: > At the present time, `assert_different_registers()` uses an O(N**2) algorithm in assert_different_registers(). We can utilize RegSet to do it in O(N) time. This would be a useful optimization for all builds with assertions enabled. > > In addition, it would be useful to be able to static_assert different registers. > > Also, I've taken the opportunity to expand the maximum size of a RegSet to 64 on 64-bit platforms. > > I also fixed a bug: sometimes `noreg` is passed to `assert_different_registers()`, but it may only be passed once or a spurious assertion is triggered. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Review feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16617/files - new: https://git.openjdk.org/jdk/pull/16617/files/857152f6..693df766 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16617&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16617&range=08-09 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16617.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16617/head:pull/16617 PR: https://git.openjdk.org/jdk/pull/16617 From duke at openjdk.org Fri May 10 16:09:01 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Fri, 10 May 2024 16:09:01 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v18] In-Reply-To: References: Message-ID: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: fix entry condition for EEVEX encoding when UseAVX=2 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/d4ecb31c..aee89e7c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=16-17 Stats: 7 lines in 2 files changed: 5 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From aph at openjdk.org Fri May 10 16:16:07 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 10 May 2024 16:16:07 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v10] In-Reply-To: References: Message-ID: <8WxseMzSimvdzZMUP4VI_l6uFKcy49mMpRrLe-zgI74=.861ed0f1-a4bb-4476-9ce9-1fe7f3b2cc6c@github.com> On Fri, 10 May 2024 15:26:29 GMT, Andrew Haley wrote: >> At the present time, `assert_different_registers()` uses an O(N**2) algorithm in assert_different_registers(). We can utilize RegSet to do it in O(N) time. This would be a useful optimization for all builds with assertions enabled. >> >> In addition, it would be useful to be able to static_assert different registers. >> >> Also, I've taken the opportunity to expand the maximum size of a RegSet to 64 on 64-bit platforms. >> >> I also fixed a bug: sometimes `noreg` is passed to `assert_different_registers()`, but it may only be passed once or a spurious assertion is triggered. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Review feedback > From the summary: > > > In addition, it would be useful to be able to static_assert different registers. > > As mentioned in [#16617 (comment)](https://github.com/openjdk/jdk/pull/16617#issuecomment-1807933886) this doesn't work unless we make the proposed small tweak. Do you want to make it in this PR, or should I propose that in a separate PR? Let's do it separately. I would, but GCC has a very relaxed attitude to `static_assert` which means I can't test anything here. Everything to do with `static_assert` just seems to work. Exhuming this one after a long time. Please review, thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16617#issuecomment-2104872804 PR Comment: https://git.openjdk.org/jdk/pull/16617#issuecomment-2104873418 From jbhateja at openjdk.org Fri May 10 18:14:07 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 10 May 2024 18:14:07 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v17] In-Reply-To: References: <3nP3cGJZXnHXo2XZDKxZGj1aNIsKW8D1lQUl_nNwDuQ=.1404a4f7-6213-4f3b-a975-291202849538@github.com> <9Q2ix7wJTkRyivF1JND_cjcoI6vn1O2przOrcnpJBXM=.8fca1722-ec94-4066-ab8d-8f5f5673e430@github.com> Message-ID: On Thu, 9 May 2024 21:42:34 GMT, Steve Dohrmann wrote: >> src/hotspot/cpu/x86/assembler_x86.cpp line 4914: >> >>> 4912: assert(VM_Version::supports_sse4_1(), ""); >>> 4913: InstructionAttr attributes(AVX_128bit, /* rex_w */ true, /* legacy_mode */ _legacy_mode_dq, /* no_mask_reg */ true, /* uses_vl */ false); >>> 4914: int encode = simd_prefix_and_encode(dst, dst, as_XMMRegister(src->encoding()), VEX_SIMD_66, VEX_OPCODE_0F_3A, &attributes, true); >> >> _legacy_mode_dq and _legacy_mode_bw will be true for non AVX512DQ/BW targets, this will cause incorrectness since our scheme has been to treat those as non-legacy instructions upfront and only perform legacy demotions in leaf level routines if non of the register operand is an EGPR. > > In general, the legacy mode will be set to true whenever UseAVX < 3, due to logic in the InstructionAttr ctor. > > `_legacy_mode(legacy_mode || UseAVX < 3` Hi @steveatgh , Still getting incorrect encoding for PINSRQ at UseAVX=0 with latest patch. This is a legacy map3 instruction which should be promoted to Extended EVEX, encoding, there is no route in _Assembler::simd_prefix_and_encode_ which can lead to EVEX encoding at UseAVX=0. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1597065361 From jbhateja at openjdk.org Fri May 10 18:14:08 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 10 May 2024 18:14:08 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v17] In-Reply-To: References: <3nP3cGJZXnHXo2XZDKxZGj1aNIsKW8D1lQUl_nNwDuQ=.1404a4f7-6213-4f3b-a975-291202849538@github.com> <9Q2ix7wJTkRyivF1JND_cjcoI6vn1O2przOrcnpJBXM=.8fca1722-ec94-4066-ab8d-8f5f5673e430@github.com> Message-ID: On Fri, 10 May 2024 18:08:58 GMT, Jatin Bhateja wrote: >> In general, the legacy mode will be set to true whenever UseAVX < 3, due to logic in the InstructionAttr ctor. >> >> `_legacy_mode(legacy_mode || UseAVX < 3` > > Hi @steveatgh , > > Still getting incorrect encoding for PINSRQ at UseAVX=0 with latest patch. > > This is a legacy map3 instruction which should be promoted to Extended EVEX, encoding, there is no route in _Assembler::simd_prefix_and_encode_ which can lead to EVEX encoding at UseAVX=0. Similar problems with PINSRB/D/W and PEXTRB/W/D/Q ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1597067481 From dlong at openjdk.org Fri May 10 21:40:17 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 10 May 2024 21:40:17 GMT Subject: RFR: 8331429: [JVMCI] Cleanup JVMCIRuntime allocation routines In-Reply-To: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> References: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> Message-ID: <9wsX9310p38cnuPHGU4xKirWfyfYR6cICO6iPhnDk5Y=.55d9503f-2cc8-4c26-b24f-2ced7f8f72f5@github.com> On Fri, 10 May 2024 13:06:21 GMT, Yudi Zheng wrote: > This PR removes allocation routines that may throw exception from JVMCIRuntime. It also exports various symbols related to the hashed secondary supers table. src/hotspot/share/jvmci/jvmciRuntime.cpp line 131: > 129: // Cannot re-execute class initialization without side effects > 130: // so return without attempting the initialization > 131: return; Do we need to call `current->set_vm_result(nullptr)` on these bailout paths? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19176#discussion_r1597249765 From duke at openjdk.org Fri May 10 21:48:35 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Fri, 10 May 2024 21:48:35 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v19] In-Reply-To: References: Message-ID: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: conditionally allow EEVEX encoding when UseAVX=0 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/aee89e7c..826fa2bb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=17-18 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From duke at openjdk.org Fri May 10 21:48:35 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Fri, 10 May 2024 21:48:35 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v17] In-Reply-To: References: <3nP3cGJZXnHXo2XZDKxZGj1aNIsKW8D1lQUl_nNwDuQ=.1404a4f7-6213-4f3b-a975-291202849538@github.com> <9Q2ix7wJTkRyivF1JND_cjcoI6vn1O2przOrcnpJBXM=.8fca1722-ec94-4066-ab8d-8f5f5673e430@github.com> Message-ID: On Fri, 10 May 2024 18:11:11 GMT, Jatin Bhateja wrote: >> Hi @steveatgh , >> >> Still getting incorrect encoding for PINSRQ at UseAVX=0 with latest patch. >> >> This is a legacy map3 instruction which should be promoted to Extended EVEX, encoding, there is no route in _Assembler::simd_prefix_and_encode_ which can lead to EVEX encoding at UseAVX=0. > > Similar problems with PINSRB/D/W and PEXTRB/W/D/Q Thanks @jatin-bhateja. I added logic to ::simd_prefix_and_encode and ::simd_prefix to conditionally allow EEVEX encoding even when UseAVX=0. Tested with PINSR* and PEXTR* ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1597253753 From duke at openjdk.org Sat May 11 01:59:29 2024 From: duke at openjdk.org (xiaotaonan) Date: Sat, 11 May 2024 01:59:29 GMT Subject: RFR: 8332032: C2: Remove ExpandSubTypeCheckAtParseTime flag Message-ID: C2: Remove ExpandSubTypeCheckAtParseTime flag ------------- Commit messages: - C2: Remove ExpandSubTypeCheckAtParseTime flag Changes: https://git.openjdk.org/jdk/pull/19187/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19187&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332032 Stats: 8 lines in 4 files changed: 0 ins; 4 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19187.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19187/head:pull/19187 PR: https://git.openjdk.org/jdk/pull/19187 From ddong at openjdk.org Sat May 11 06:24:09 2024 From: ddong at openjdk.org (Denghui Dong) Date: Sat, 11 May 2024 06:24:09 GMT Subject: RFR: 8327661: C1: Make RBP allocatable on x64 when PreserveFramePointer is disabled [v3] In-Reply-To: References: Message-ID: On Wed, 13 Mar 2024 06:49:30 GMT, Denghui Dong wrote: >> Hi, >> >> Could I have a review of this change that makes RBP allocatable in c1 register allocation when PreserveFramePointer is not enabled. >> >> There seems no reason that RBP cannot be used. Although the performance of c1 jit code is not very critical, in my opinion, this change will not add overhead of compilation. So maybe it is acceptable. >> >> I am not very sure if I have changed all the places that should be. >> >> Testing: fastdebug tier1-4 on Linux x64 > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > delete jmh Gentle ping. Since the benefits are not obvious, I'll close this PR if there are no reviews for one more week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18167#issuecomment-2105590018 From dnsimon at openjdk.org Sat May 11 07:44:09 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Sat, 11 May 2024 07:44:09 GMT Subject: RFR: 8331429: [JVMCI] Cleanup JVMCIRuntime allocation routines In-Reply-To: <9wsX9310p38cnuPHGU4xKirWfyfYR6cICO6iPhnDk5Y=.55d9503f-2cc8-4c26-b24f-2ced7f8f72f5@github.com> References: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> <9wsX9310p38cnuPHGU4xKirWfyfYR6cICO6iPhnDk5Y=.55d9503f-2cc8-4c26-b24f-2ced7f8f72f5@github.com> Message-ID: On Fri, 10 May 2024 21:37:39 GMT, Dean Long wrote: >> This PR removes allocation routines that may throw exception from JVMCIRuntime. It also exports various symbols related to the hashed secondary supers table. > > src/hotspot/share/jvmci/jvmciRuntime.cpp line 131: > >> 129: // Cannot re-execute class initialization without side effects >> 130: // so return without attempting the initialization >> 131: return; > > Do we need to call `current->set_vm_result(nullptr)` on these bailout paths? That's done in `~RetryableAllocationMark`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19176#discussion_r1597386283 From fjiang at openjdk.org Sat May 11 07:51:03 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Sat, 11 May 2024 07:51:03 GMT Subject: RFR: 8331281: RISC-V: C2: Support vector-scalar and vector-immediate bitwise logic instructions In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 12:17:58 GMT, Gui Cao wrote: > Hi, We want to support vector-scalar and vector-immediate bitwise logic instructions, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot. > We can use the Int256VectorTests.java[2] to print the compilation log, verify and observe the generation of nodes. > > For example, we can use the following command to print the compilation log of a jtreg test case: > > > /home/zifeihan/jdk-tools/jtreg/bin/jtreg \ > -v:default \ > -concurrency:16 -timeout:50 \ > -javaoption:-XX:+UnlockExperimentalVMOptions \ > -javaoption:-XX:+UseRVV \ > -javaoption:-XX:+PrintOptoAssembly \ > -javaoption:-XX:LogFile=/home/zifeihan/jdk/Int256VectorTests_PrintOptoAssembly.log \ > -jdk:/home/zifeihan/jdk/build/linux-riscv64-server-fastdebug/jdk \ > /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/Int256VectorTests.java > > > > we can observe the specified compilation log `Int256VectorTests_PrintOptoAssembly.log`, which contains the vector-scalar and vector-immediate bitwise logic node for the PR implementation. > > vand_immI Node > > > 0b4 vloadcon V3 # generate iota indices > 0bc vmla V2, V2, V3, V1 > 0c4 vand_immI V2, V2, #7 > 0cc addi R7, R30, #16 # ptr, #@addP_reg_imm > 0d0 storeV [R7], V2 # vector (rvv) > > > vor_regI Node > > > 180 vor_regI V1, V1, R30 > 188 add R31, R14, R31 # ptr, #@addP_reg_reg > 18a addi R31, R31, #16 # ptr, #@addP_reg_imm > 18c storeV [R31], V1 # vector (rvv) > 194 addiw R11, R11, #8 #@addI_reg_imm > 196 blt R11, R13, B17 #@cmpI_loop P=0.500000 C=30564.000000 > > > vxor_regI Node > > 198 vxor_regI V1, V1, R30 > 1a0 add R14, R16, R14 # ptr, #@addP_reg_reg > 1a2 addi R14, R14, #16 # ptr, #@addP_reg_imm > 1a4 storeV [R14], V1 # vector (rvv) > 1ac addiw R11, R11, #8 #@addI_reg_imm > 1ae blt R11, R13, B21 #@cmpI_loop P=0.500000 C=30564.000000 > > > vand_regI_masked Node > > 234 B31: # out( B40 B32 ) <- in( B30 ) Freq: 78.5481 > 234 loadV V2, [R15] # vector (rvv) > 23c vand_regI_masked V2, V2, R11 > 244 storeV [R9], V2 # vector (rvv) > 24c mv R10, #8 # int, #@loadConI > 24e ble R7, R10, B40 #@cmpI_branch P=0.000001 C=-1.000000 > > > vor_regI_masked Node > > 1ee B32: # out( B38 B33 ) <- in( B31 ) Freq: 75.8475 > 1ee loadV V1, [R11] # vector (rvv) > 1f6 vor_regI_masked V1, V1, R31 > 1fe addi R11, R13, #32 # ptr, #@addP_reg_imm > 202 bgeu R29, R10, B38 #@cmpU_branch P=0.000001 C=-1.000000 > > vxor_regI_masked Node > > 1ee B32: # out( B38 B33 ) <- in( B31 ) Freq: 75.8475 > 1ee loadV V1, [R11]... Overall looks good, with one minor comment. src/hotspot/cpu/riscv/riscv_v.ad line 513: > 511: // vector-scalar and (unpredicated) > 512: > 513: instruct vand_regI(vReg dst_src, iRegI src) %{ Do we need `iRegIorL2I` for `RegI` related instructions? ------------- PR Review: https://git.openjdk.org/jdk/pull/18999#pullrequestreview-2051120671 PR Review Comment: https://git.openjdk.org/jdk/pull/18999#discussion_r1597383543 From dlong at openjdk.org Sat May 11 08:07:02 2024 From: dlong at openjdk.org (Dean Long) Date: Sat, 11 May 2024 08:07:02 GMT Subject: RFR: 8331429: [JVMCI] Cleanup JVMCIRuntime allocation routines In-Reply-To: References: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> <9wsX9310p38cnuPHGU4xKirWfyfYR6cICO6iPhnDk5Y=.55d9503f-2cc8-4c26-b24f-2ced7f8f72f5@github.com> Message-ID: On Sat, 11 May 2024 07:41:17 GMT, Doug Simon wrote: >> src/hotspot/share/jvmci/jvmciRuntime.cpp line 131: >> >>> 129: // Cannot re-execute class initialization without side effects >>> 130: // so return without attempting the initialization >>> 131: return; >> >> Do we need to call `current->set_vm_result(nullptr)` on these bailout paths? > > That's done in `~RetryableAllocationMark`. Only for the HAS_PENDING_EXCEPTION case. What about the !h->is_initialized() case? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19176#discussion_r1597394595 From dnsimon at openjdk.org Sat May 11 09:09:11 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Sat, 11 May 2024 09:09:11 GMT Subject: RFR: 8331429: [JVMCI] Cleanup JVMCIRuntime allocation routines In-Reply-To: References: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> <9wsX9310p38cnuPHGU4xKirWfyfYR6cICO6iPhnDk5Y=.55d9503f-2cc8-4c26-b24f-2ced7f8f72f5@github.com> Message-ID: On Sat, 11 May 2024 08:04:01 GMT, Dean Long wrote: >> That's done in `~RetryableAllocationMark`. > > Only for the HAS_PENDING_EXCEPTION case. What about the !h->is_initialized() case? Good observation - seems like this is an outstanding bug. Can you please address that Yudi. In practice, I wonder how much this matters as Graal always [clears the object result](https://github.com/oracle/graal/blob/0b61d20b08b1af76bd35cfb673c7be8d33855f51/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/hotspot/stubs/ForeignCallSnippets.java#L127) after reading it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19176#discussion_r1597405871 From jbhateja at openjdk.org Sat May 11 21:25:06 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 11 May 2024 21:25:06 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers In-Reply-To: <2ix8fZdbyXTav2FBERlzl7U6JkI3i9hPFGSNKbrDlpo=.a219b3de-7035-44d0-9bdc-3ea599800eb3@github.com> References: <2ix8fZdbyXTav2FBERlzl7U6JkI3i9hPFGSNKbrDlpo=.a219b3de-7035-44d0-9bdc-3ea599800eb3@github.com> Message-ID: On Fri, 3 May 2024 19:38:08 GMT, Steve Dohrmann wrote: >> How can we be confident that the encoding is correct? Would it be possible to write tests for this? Maybe one that disassembles it and compares the result to a 3rd party disassembler offline or in-process hsdis? > > In response to @dean-long, @theRealAph wrote: >> When we wrote the AArch64 port, there was no available hardware to test it on. So, we wrote a simulator to test it. However, we ran the risk that if our understanding of instruction encoding was wrong, our assembler and our simulator might appear to work correctly when used together, but the result would not run on real AArch64 hardware once it arrived. So, as well as a simulator for the architecture, we verified the internal HotSpot assembler by checking its encoding against GNU `as`. See /test/hotspot/gtest/aarch64, where a Python program generates source for both the HotSpot internal assembler and GNU `as`. I strongly suggest you do something similar. (As a matter for the historical record, this did work. The test found several encoding bugs. Once we got the first real AArch64 hardware, the port worked almost immediately.) > > Thanks for the description. It would be great to create a similar tool for x86. I tested the encoding manually using the SDE as the authoritative source. It is tedious though and very time consuming. > > A subsequent PR in [JDK-8329030](https://bugs.openjdk.org/browse/JDK-8329030), perhaps the one that adds encoding support for New Data Destination variants, should include such a tool. Hi @steveatgh , I have few more comments. A) With recent change register only flavors of cvtsi2ss / cvtsi2sd / cvttsd2si/ cvttss2si which are all legacy map 1 instruction and are encoded using REX prefixes at UseAVX=0 will now be promoted to EEVEX which is a fixed 4 byte prefix, we should use REX2 instead. [cvtsi2ss_MAP1_with_EEVEX.txt](https://github.com/openjdk/jdk/files/15284294/cvtsi2ss_MAP1_with_EEVEX.txt) B) Memory operand flavor of paddd : Missing EVEX tuples for memory operand instructions, it will prevent applying EVEX compressed displacement (disp8*N) encoding optimization. FTR: These are map 1 legacy instruction which could be encoded using SIMD + REX prefix, which adds up to two byte prefix, currently we promote them to VEX encoding in order to zero upper 128 bits, this added additional byte penalty in prefix since it used three byte VEX prefix (c4), now we will encode it using EEVEX if address operands (BASE/INDEX) is a EGPR which will add another byte to prefix since EVEX is a fixed 4 byte prefix. As mentioned above at UseAVX=0 we should encode them using REX2. [paddd_MAP1_VEX_now_EEVEX.txt](https://github.com/openjdk/jdk/files/15284296/paddd_MAP1_VEX_now_EEVEX.txt) C) Memory operand flavor of pcmpestri, ptest and vptest. - missing address tuple - legacy mode is true should be false. Kindly incorporate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2106034198 From duke at openjdk.org Sun May 12 02:01:26 2024 From: duke at openjdk.org (xiaotaonan) Date: Sun, 12 May 2024 02:01:26 GMT Subject: RFR: 8332032: C2: Remove ExpandSubTypeCheckAtParseTime flag [v2] In-Reply-To: References: Message-ID: <2O249IeKBpJ_BTBw_bIf5gkIn_eDjUFBXl_Q1GjQcmY=.b8c003d3-452e-4fe5-ae4c-53e0d57c4dea@github.com> > C2: Remove ExpandSubTypeCheckAtParseTime flag xiaotaonan has updated the pull request incrementally with one additional commit since the last revision: Add API to access ZipEntry.extraAttributes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19187/files - new: https://git.openjdk.org/jdk/pull/19187/files/681db95d..150ce858 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19187&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19187&range=00-01 Stats: 17 lines in 1 file changed: 17 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19187.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19187/head:pull/19187 PR: https://git.openjdk.org/jdk/pull/19187 From duke at openjdk.org Sun May 12 02:57:08 2024 From: duke at openjdk.org (xiaotaonan) Date: Sun, 12 May 2024 02:57:08 GMT Subject: Withdrawn: 8332032: C2: Remove ExpandSubTypeCheckAtParseTime flag In-Reply-To: References: Message-ID: <9q-cX7RJKLeJyXdE9v_BqerfpZOTY5yX6wTwsyVg0eE=.5a8faea6-416d-40ea-be3c-602d9841fb96@github.com> On Sat, 11 May 2024 01:55:25 GMT, xiaotaonan wrote: > C2: Remove ExpandSubTypeCheckAtParseTime flag This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/19187 From duke at openjdk.org Sun May 12 03:07:19 2024 From: duke at openjdk.org (xiaotaonan) Date: Sun, 12 May 2024 03:07:19 GMT Subject: RFR: 8332032: C2: Remove ExpandSubTypeCheckAtParseTime flag Message-ID: C2: Remove ExpandSubTypeCheckAtParseTime flag ------------- Commit messages: - C2: Remove ExpandSubTypeCheckAtParseTime flag Changes: https://git.openjdk.org/jdk/pull/19205/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19205&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332032 Stats: 10 lines in 4 files changed: 0 ins; 4 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19205.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19205/head:pull/19205 PR: https://git.openjdk.org/jdk/pull/19205 From liach at openjdk.org Sun May 12 15:14:04 2024 From: liach at openjdk.org (Chen Liang) Date: Sun, 12 May 2024 15:14:04 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v8] In-Reply-To: References: Message-ID: <3eERzqYdCd4f9qn4KpzBA9ealaUTzC67wIhzB18ETTE=.f9d17a6f-1ca5-477f-8344-40c20abe7d7e@github.com> On Mon, 6 May 2024 18:24:25 GMT, Adam Sotona wrote: >> Hi, >> During performance optimization work on Class-File API as JDK lambda generator we found some static initialization killers. >> One of them is `java.lang.classfile.Attributes` with tens of static fields initialized with individual attribute mappers, and common set of all mappers, and static map from attribute names to the mappers. >> >> I propose to turn all the static fields into lazy-initialized static methods and remove `PREDEFINED_ATTRIBUTES` and `standardAttribute(Utf8Entry name)` static mapping method from the `Attributes` API class. >> >> Please let me know your comments or objections and please review the [PR](https://github.com/openjdk/jdk/pull/19006) and [CSR](https://bugs.openjdk.org/browse/JDK-8331414), so we can make it into 23. >> >> Thank you, >> Adam > > Adam Sotona has updated the pull request incrementally with one additional commit since the last revision: > > fixed tests src/java.base/share/classes/java/lang/classfile/Attributes.java line 153: > 151: > 152: /** > 153: * {@return Attribute mapper for the {@code AnnotationDefault} attribute} Just wondering, can we change `{@code AnnotationDefault}` to `{@value #NAME_ANNOTATION_DEFAULT}`, etc? This way, the names are still rendered as code in Javadoc HTML, but they are generated with links to the constants, and programmers will see these constants and prefer them over hardcoded values. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19006#discussion_r1597655934 From duke at openjdk.org Mon May 13 01:02:09 2024 From: duke at openjdk.org (xiaotaonan) Date: Mon, 13 May 2024 01:02:09 GMT Subject: RFR: 8332032: C2: Remove ExpandSubTypeCheckAtParseTime flag In-Reply-To: References: Message-ID: On Sun, 12 May 2024 03:02:39 GMT, xiaotaonan wrote: > C2: Remove ExpandSubTypeCheckAtParseTime flag @lgxbslgx ------------- PR Comment: https://git.openjdk.org/jdk/pull/19205#issuecomment-2106449919 From galder at openjdk.org Mon May 13 05:04:38 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 13 May 2024 05:04:38 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v16] In-Reply-To: References: Message-ID: > Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. > > The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: > > > $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op > ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op > ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op > ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op > ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op > ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op > ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op > ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op > ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op > ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op > ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op > ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op > ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op > ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op > ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op > > > It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. > > I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. > > > $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > ... > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 > > > One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? > > Thanks @rwestrel for his help shaping this up :) Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/c1/c1_GraphBuilder.cpp Co-authored-by: Dean Long <17332032+dean-long at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17667/files - new: https://git.openjdk.org/jdk/pull/17667/files/a35cdd84..c3b7fa47 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17667&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17667&range=14-15 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17667.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17667/head:pull/17667 PR: https://git.openjdk.org/jdk/pull/17667 From chagedorn at openjdk.org Mon May 13 05:42:02 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 13 May 2024 05:42:02 GMT Subject: RFR: 8332032: C2: Remove ExpandSubTypeCheckAtParseTime flag In-Reply-To: References: Message-ID: <_VqdctFR4arGmdWQk9opoNMe6h1Rwa0gKDWHEcHyO9Y=.2ea3c35b-660d-4431-bc47-e0a874c386ce@github.com> On Mon, 13 May 2024 00:59:18 GMT, xiaotaonan wrote: >> C2: Remove ExpandSubTypeCheckAtParseTime flag > > @lgxbslgx Hi @xiaotaonan, please first ask in JBS if you can take over RFEs/bugs that are already assigned like this one, especially if it has just been filed. This PR misses the entire context why this flag should be removed and what the pros/cons and trade-offs are. I planned to do some more offline discussions first before proposing the actual PR to remove this flag since it is now related to an otherwise hard-to-fix bug in Valhalla. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19205#issuecomment-2106694300 From epeter at openjdk.org Mon May 13 05:48:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 May 2024 05:48:18 GMT Subject: RFR: 8331764: C2 SuperWord: refactor _align_to_ref/_mem_ref_for_main_loop_alignment In-Reply-To: References: Message-ID: On Wed, 8 May 2024 14:33:22 GMT, Vladimir Kozlov wrote: >> This PR accomplishes these things: >> - Rename `_align_to_ref` -> `_mem_ref_for_main_loop_alignment`. >> - Move the `mem_ref` finding for alignment out of `SuperWord::find_adjacent_refs`. This is too early, and we don't even know if the relevant `mem_ref` is going to be vectorized. It makes more sense to pick a `mem_ref` directly in `SuperWord::adjust_pre_loop_limit_to_align_main_loop_vectors`, where we already know what packs are going to be vectorized. >> - For the alignment width (aw), we can use the `vector_width` of the pack to which the `mem_ref` belongs, rather than the potentially much larger `vector_width_in_bytes`. I track this with `_aw_for_main_loop_alignment` now. >> >> I need this for https://github.com/openjdk/jdk/pull/18822, and decided to split it out into an independent change. > > Good. Thanks @vnkozlov @chhagedorn for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19115#issuecomment-2106700060 From epeter at openjdk.org Mon May 13 05:48:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 May 2024 05:48:18 GMT Subject: Integrated: 8331764: C2 SuperWord: refactor _align_to_ref/_mem_ref_for_main_loop_alignment In-Reply-To: References: Message-ID: <0gCo8BOJuWlOFZndYqNlwDzkqjSpsjNvN4wHpFFpzUU=.40169e88-d51e-444a-bcab-a52877acb526@github.com> On Tue, 7 May 2024 09:26:11 GMT, Emanuel Peter wrote: > This PR accomplishes these things: > - Rename `_align_to_ref` -> `_mem_ref_for_main_loop_alignment`. > - Move the `mem_ref` finding for alignment out of `SuperWord::find_adjacent_refs`. This is too early, and we don't even know if the relevant `mem_ref` is going to be vectorized. It makes more sense to pick a `mem_ref` directly in `SuperWord::adjust_pre_loop_limit_to_align_main_loop_vectors`, where we already know what packs are going to be vectorized. > - For the alignment width (aw), we can use the `vector_width` of the pack to which the `mem_ref` belongs, rather than the potentially much larger `vector_width_in_bytes`. I track this with `_aw_for_main_loop_alignment` now. > > I need this for https://github.com/openjdk/jdk/pull/18822, and decided to split it out into an independent change. This pull request has now been integrated. Changeset: d517d2df Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/d517d2df451e135583083ed3684d7d3241b36f76 Stats: 67 lines in 2 files changed: 41 ins; 20 del; 6 mod 8331764: C2 SuperWord: refactor _align_to_ref/_mem_ref_for_main_loop_alignment Reviewed-by: kvn, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/19115 From epeter at openjdk.org Mon May 13 06:01:33 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 May 2024 06:01:33 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v8] In-Reply-To: <8-_t7nWbR9gZ2_QkfFNuf5M0Q4PMkKJKgwS3ZbHcCxI=.32dc4f11-dec5-468d-afc8-3b4dae285dcb@github.com> References: <8-_t7nWbR9gZ2_QkfFNuf5M0Q4PMkKJKgwS3ZbHcCxI=.32dc4f11-dec5-468d-afc8-3b4dae285dcb@github.com> Message-ID: On Wed, 8 May 2024 20:22:51 GMT, Bhavana Kilambi wrote: > I am not sure if I fully understand what's expected in the JTREG tests. Should I be verifying the -XX:+PrintIdeal output to make sure the correct message is being printed for the ReductionV* nodes? Yes, the IR framework basically does regex matching against the PrintIdeal graph. For example: `counts = {IRNode.STORE_VECTOR, ">0"}` in the `@IR` rule executes the regex for the store vector, and checks if we find more than zero occurances. Maybe you can just use a regex string directly for your special IR rule. Alternatively, you could have them in the `IRNode` class, but not sure that's worth it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18034#issuecomment-2106712934 From epeter at openjdk.org Mon May 13 06:03:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 May 2024 06:03:36 GMT Subject: RFR: 8329273: C2 SuperWord: Some basic MemorySegment IR tests [v2] In-Reply-To: References: Message-ID: > I could not find any IR vectorization tests for `MemorySegment` yet. > > I make sure to exercise different backing types: > - arrays > - buffers > - native memory > > I filed a follow-up RFE, to eventually make all cases where I have "FAILS" vectorize: > > [JDK-8331659](https://bugs.openjdk.org/browse/JDK-8331659): C2 SuperWord: investicate failed vectorization in compiler/loopopts/superword/TestMemorySegment.java Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 25 additional commits since the last revision: - Merge branch 'master' into JDK-8329273-memory-segment-ir-tests - fix tabs - speed up test - small cosmetic fix - make things static - long loop tests - handle AlignVector - int cases - int-index case - disable mixed tests - ... and 15 more: https://git.openjdk.org/jdk/compare/43da3db1...6f760dfd ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18535/files - new: https://git.openjdk.org/jdk/pull/18535/files/b6f16a58..6f760dfd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18535&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18535&range=00-01 Stats: 43101 lines in 1987 files changed: 18450 ins; 16140 del; 8511 mod Patch: https://git.openjdk.org/jdk/pull/18535.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18535/head:pull/18535 PR: https://git.openjdk.org/jdk/pull/18535 From chagedorn at openjdk.org Mon May 13 06:48:17 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 13 May 2024 06:48:17 GMT Subject: RFR: 8329273: C2 SuperWord: Some basic MemorySegment IR tests [v2] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 06:03:36 GMT, Emanuel Peter wrote: >> I could not find any IR vectorization tests for `MemorySegment` yet. >> >> I make sure to exercise different backing types: >> - arrays >> - buffers >> - native memory >> >> I filed a follow-up RFE, to eventually make all cases where I have "FAILS" vectorize: >> >> [JDK-8331659](https://bugs.openjdk.org/browse/JDK-8331659): C2 SuperWord: investicate failed vectorization in compiler/loopopts/superword/TestMemorySegment.java > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 25 additional commits since the last revision: > > - Merge branch 'master' into JDK-8329273-memory-segment-ir-tests > - fix tabs > - speed up test > - small cosmetic fix > - make things static > - long loop tests > - handle AlignVector > - int cases > - int-index case > - disable mixed tests > - ... and 15 more: https://git.openjdk.org/jdk/compare/2faa8c83...6f760dfd Good basic tests! I have a few minor comments but otherwise, looks good. test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 36: > 34: /* > 35: * @test id=byte-array > 36: * @bug 8310190 Should be updated to 8329273. Same for other runs test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 166: > 164: String providerName = System.getProperty("memorySegmentProviderNameForTestVM"); > 165: provider = switch (providerName) { > 166: case "ByteArray" -> ( () -> { return newMemorySegmentOfByteArray(); } ); You can directly use an expression lambda without return: case "ByteArray" -> (() -> newMemorySegmentOfByteArray()); But I think you can go even further and directly use a method reference: Suggestion: case "ByteArray" -> (TestMemorySegmentImpl::newMemorySegmentOfByteArray); Same for others. test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 181: > 179: default -> throw new RuntimeException("Test argument not recognized: " + providerName); > 180: }; > 181: } As discussed offline, this is an interesting workaround. Maybe the IR framework could be extended at some point to simplify this. test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 187: > 185: > 186: // List of gold, the results from the first run before compilation > 187: Map golds = new HashMap(); You can replace these with `<>`: Suggestion: // List of tests Map tests = new HashMap<>(); // List of gold, the results from the first run before compilation Map golds = new HashMap<>(); test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 199: > 197: tests.put("testMemorySegmentBadExitCheck", () -> { > 198: return testMemorySegmentBadExitCheck(copy(a)); > 199: }); Same as above, you can replace this with an expression lambda: Suggestion: tests.put("testIntLoop_longIndex_intInvar_sameAdr_byte", () -> testIntLoop_longIndex_intInvar_sameAdr_byte(copy(a), 0)); Same for others. test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 347: > 345: > 346: static MemorySegment newMemorySegmentOfMixedBuffer() { > 347: switch(RANDOM.nextInt(2)) { Suggestion: switch (RANDOM.nextInt(2)) { test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 354: > 352: > 353: static MemorySegment newMemorySegmentOfMixed() { > 354: switch(RANDOM.nextInt(3)) { Suggestion: switch (RANDOM.nextInt(3)) { test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 447: > 445: @IR(counts = {IRNode.LOAD_VECTOR_B, "= 0", > 446: IRNode.ADD_VB, "= 0", > 447: IRNode.STORE_VECTOR, "= 0"}, You should use `failOn` instead of `= 0`. Same for other tests. ------------- PR Review: https://git.openjdk.org/jdk/pull/18535#pullrequestreview-2051802215 PR Review Comment: https://git.openjdk.org/jdk/pull/18535#discussion_r1597940804 PR Review Comment: https://git.openjdk.org/jdk/pull/18535#discussion_r1597946319 PR Review Comment: https://git.openjdk.org/jdk/pull/18535#discussion_r1597942075 PR Review Comment: https://git.openjdk.org/jdk/pull/18535#discussion_r1597947716 PR Review Comment: https://git.openjdk.org/jdk/pull/18535#discussion_r1597949915 PR Review Comment: https://git.openjdk.org/jdk/pull/18535#discussion_r1597950088 PR Review Comment: https://git.openjdk.org/jdk/pull/18535#discussion_r1597950155 PR Review Comment: https://git.openjdk.org/jdk/pull/18535#discussion_r1597942531 From duke at openjdk.org Mon May 13 07:08:15 2024 From: duke at openjdk.org (xiaotaonan) Date: Mon, 13 May 2024 07:08:15 GMT Subject: RFR: 8332032: C2: Remove ExpandSubTypeCheckAtParseTime flag In-Reply-To: <_VqdctFR4arGmdWQk9opoNMe6h1Rwa0gKDWHEcHyO9Y=.2ea3c35b-660d-4431-bc47-e0a874c386ce@github.com> References: <_VqdctFR4arGmdWQk9opoNMe6h1Rwa0gKDWHEcHyO9Y=.2ea3c35b-660d-4431-bc47-e0a874c386ce@github.com> Message-ID: On Mon, 13 May 2024 05:39:11 GMT, Christian Hagedorn wrote: > please first ask in JBS if you can take over RFEs/bugs that are already assigned like this one, especially if it has just been filed. This PR misses the entire context why this flag should be removed and what the pros/cons and trade-offs are. I planned to do some more offline discussions first before proposing the actual PR to remove this flag since it is now related to an otherwise hard-to-fix bug in Valhalla. OK. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19205#issuecomment-2106802957 From duke at openjdk.org Mon May 13 07:08:15 2024 From: duke at openjdk.org (xiaotaonan) Date: Mon, 13 May 2024 07:08:15 GMT Subject: Withdrawn: 8332032: C2: Remove ExpandSubTypeCheckAtParseTime flag In-Reply-To: References: Message-ID: On Sun, 12 May 2024 03:02:39 GMT, xiaotaonan wrote: > C2: Remove ExpandSubTypeCheckAtParseTime flag This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/19205 From epeter at openjdk.org Mon May 13 07:18:34 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 May 2024 07:18:34 GMT Subject: RFR: 8329273: C2 SuperWord: Some basic MemorySegment IR tests [v3] In-Reply-To: References: Message-ID: > I could not find any IR vectorization tests for `MemorySegment` yet. > > I make sure to exercise different backing types: > - arrays > - buffers > - native memory > > I filed a follow-up RFE, to eventually make all cases where I have "FAILS" vectorize: > > [JDK-8331659](https://bugs.openjdk.org/browse/JDK-8331659): C2 SuperWord: investicate failed vectorization in compiler/loopopts/superword/TestMemorySegment.java Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18535/files - new: https://git.openjdk.org/jdk/pull/18535/files/6f760dfd..3cbb4664 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18535&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18535&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18535.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18535/head:pull/18535 PR: https://git.openjdk.org/jdk/pull/18535 From epeter at openjdk.org Mon May 13 07:18:37 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 May 2024 07:18:37 GMT Subject: RFR: 8329273: C2 SuperWord: Some basic MemorySegment IR tests [v2] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 06:34:45 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 25 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8329273-memory-segment-ir-tests >> - fix tabs >> - speed up test >> - small cosmetic fix >> - make things static >> - long loop tests >> - handle AlignVector >> - int cases >> - int-index case >> - disable mixed tests >> - ... and 15 more: https://git.openjdk.org/jdk/compare/aa5b224f...6f760dfd > > test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 447: > >> 445: @IR(counts = {IRNode.LOAD_VECTOR_B, "= 0", >> 446: IRNode.ADD_VB, "= 0", >> 447: IRNode.STORE_VECTOR, "= 0"}, > > You should use `failOn` instead of `= 0`. Same for other tests. I honestly prefer "= 0", because it is easier to flip to "> 0", and keeps the same style that way. But I guess that is really a matter of taste. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18535#discussion_r1597982303 From epeter at openjdk.org Mon May 13 07:26:09 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 May 2024 07:26:09 GMT Subject: RFR: 8329273: C2 SuperWord: Some basic MemorySegment IR tests [v2] In-Reply-To: References: Message-ID: <49UAPFqTeTFEbRuJMW_pYQ8RJAKYj3DFYVIi8WHeMgI=.f7a067ef-878a-4875-9846-cb163403ba96@github.com> On Mon, 13 May 2024 06:32:45 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 25 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8329273-memory-segment-ir-tests >> - fix tabs >> - speed up test >> - small cosmetic fix >> - make things static >> - long loop tests >> - handle AlignVector >> - int cases >> - int-index case >> - disable mixed tests >> - ... and 15 more: https://git.openjdk.org/jdk/compare/7e77b898...6f760dfd > > test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 36: > >> 34: /* >> 35: * @test id=byte-array >> 36: * @bug 8310190 > > Should be updated to 8329273. Same for other runs Nice catch! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18535#discussion_r1597992380 From epeter at openjdk.org Mon May 13 07:30:10 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 May 2024 07:30:10 GMT Subject: RFR: 8329273: C2 SuperWord: Some basic MemorySegment IR tests [v2] In-Reply-To: References: Message-ID: <4WYvsoVX9v8WsQS8-74kMas53r2Bo-TVu2_TkmGWwTA=.a64336fb-238a-4f1c-98bc-83a8079ad5ea@github.com> On Mon, 13 May 2024 06:39:05 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 25 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8329273-memory-segment-ir-tests >> - fix tabs >> - speed up test >> - small cosmetic fix >> - make things static >> - long loop tests >> - handle AlignVector >> - int cases >> - int-index case >> - disable mixed tests >> - ... and 15 more: https://git.openjdk.org/jdk/compare/06854a6b...6f760dfd > > test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 166: > >> 164: String providerName = System.getProperty("memorySegmentProviderNameForTestVM"); >> 165: provider = switch (providerName) { >> 166: case "ByteArray" -> ( () -> { return newMemorySegmentOfByteArray(); } ); > > You can directly use an expression lambda without return: > > case "ByteArray" -> (() -> newMemorySegmentOfByteArray()); > > But I think you can go even further and directly use a method reference: > Suggestion: > > case "ByteArray" -> (TestMemorySegmentImpl::newMemorySegmentOfByteArray); > > Same for others. Oh, great idea! > test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 181: > >> 179: default -> throw new RuntimeException("Test argument not recognized: " + providerName); >> 180: }; >> 181: } > > As discussed offline, this is an interesting workaround. Maybe the IR framework could be extended at some point to simplify this. That would be nice! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18535#discussion_r1597996513 PR Review Comment: https://git.openjdk.org/jdk/pull/18535#discussion_r1597996770 From epeter at openjdk.org Mon May 13 07:38:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 May 2024 07:38:11 GMT Subject: RFR: 8329273: C2 SuperWord: Some basic MemorySegment IR tests [v2] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 06:42:42 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 25 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8329273-memory-segment-ir-tests >> - fix tabs >> - speed up test >> - small cosmetic fix >> - make things static >> - long loop tests >> - handle AlignVector >> - int cases >> - int-index case >> - disable mixed tests >> - ... and 15 more: https://git.openjdk.org/jdk/compare/7eaa6f7c...6f760dfd > > test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 199: > >> 197: tests.put("testMemorySegmentBadExitCheck", () -> { >> 198: return testMemorySegmentBadExitCheck(copy(a)); >> 199: }); > > Same as above, you can replace this with an expression lambda: > Suggestion: > > tests.put("testIntLoop_longIndex_intInvar_sameAdr_byte", > () -> testIntLoop_longIndex_intInvar_sameAdr_byte(copy(a), 0)); > > Same for others. Nice idea! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18535#discussion_r1598006814 From epeter at openjdk.org Mon May 13 07:47:35 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 May 2024 07:47:35 GMT Subject: RFR: 8329273: C2 SuperWord: Some basic MemorySegment IR tests [v4] In-Reply-To: References: Message-ID: > I could not find any IR vectorization tests for `MemorySegment` yet. > > I make sure to exercise different backing types: > - arrays > - buffers > - native memory > > I filed a follow-up RFE, to eventually make all cases where I have "FAILS" vectorize: > > [JDK-8331659](https://bugs.openjdk.org/browse/JDK-8331659): C2 SuperWord: investicate failed vectorization in compiler/loopopts/superword/TestMemorySegment.java Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'JDK-8329273-memory-segment-ir-tests' of https://github.com/eme64/jdk into JDK-8329273-memory-segment-ir-tests - review suggestions by Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18535/files - new: https://git.openjdk.org/jdk/pull/18535/files/3cbb4664..b6ddb4b7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18535&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18535&range=02-03 Stats: 101 lines in 1 file changed: 0 ins; 50 del; 51 mod Patch: https://git.openjdk.org/jdk/pull/18535.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18535/head:pull/18535 PR: https://git.openjdk.org/jdk/pull/18535 From asotona at openjdk.org Mon May 13 07:54:09 2024 From: asotona at openjdk.org (Adam Sotona) Date: Mon, 13 May 2024 07:54:09 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v8] In-Reply-To: <3eERzqYdCd4f9qn4KpzBA9ealaUTzC67wIhzB18ETTE=.f9d17a6f-1ca5-477f-8344-40c20abe7d7e@github.com> References: <3eERzqYdCd4f9qn4KpzBA9ealaUTzC67wIhzB18ETTE=.f9d17a6f-1ca5-477f-8344-40c20abe7d7e@github.com> Message-ID: <8bkIrXCl7OsuLoMQi43faVELq0d1R-P60pSCGkxpwpU=.fe207403-8288-4f2d-ab7d-96fec5ba212e@github.com> On Sun, 12 May 2024 15:11:17 GMT, Chen Liang wrote: >> Adam Sotona has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed tests > > src/java.base/share/classes/java/lang/classfile/Attributes.java line 153: > >> 151: >> 152: /** >> 153: * {@return Attribute mapper for the {@code AnnotationDefault} attribute} > > Just wondering, can we change `{@code AnnotationDefault}` to `{@value #NAME_ANNOTATION_DEFAULT}`, etc? This way, the names are still rendered as code in Javadoc HTML, but they are generated with links to the constants, and programmers will see these constants and prefer them over hardcoded values. On the other side it is questionable if the attribute names should be exposed in the API. We provide corresponding mappers and attribute models. I don't see a case where user would need to use the attribute names directly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19006#discussion_r1598026518 From chagedorn at openjdk.org Mon May 13 07:58:04 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 13 May 2024 07:58:04 GMT Subject: RFR: 8329273: C2 SuperWord: Some basic MemorySegment IR tests [v4] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 07:47:35 GMT, Emanuel Peter wrote: >> I could not find any IR vectorization tests for `MemorySegment` yet. >> >> I make sure to exercise different backing types: >> - arrays >> - buffers >> - native memory >> >> I filed a follow-up RFE, to eventually make all cases where I have "FAILS" vectorize: >> >> [JDK-8331659](https://bugs.openjdk.org/browse/JDK-8331659): C2 SuperWord: investicate failed vectorization in compiler/loopopts/superword/TestMemorySegment.java > > Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'JDK-8329273-memory-segment-ir-tests' of https://github.com/eme64/jdk into JDK-8329273-memory-segment-ir-tests > - review suggestions by Christian Updates look good, thanks! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18535#pullrequestreview-2051958085 From epeter at openjdk.org Mon May 13 08:03:03 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 May 2024 08:03:03 GMT Subject: RFR: 8325155: C2 SuperWord: remove alignment boundaries Message-ID: I have tried for a very long time to get rid of all the `alignment(n)` code that is all over the SuperWord code. With lots of previous work, I am now finally ready to remove it. I was able to remove lots of VM code, about 300 lines. And the removed code is I think much more complicated than the new code. This is what I did in this PR: - Removal of `_node_info`: used to have many fields, which I refactored out to the `VLoopAnalyzer` modules. `alignment` is the last component, which I now remove. - Changed the implementation of `SuperWord::find_adjacent_refs`, now `SuperWord::find_adjacent_memop_pairs`, completely: - It used to be an algorithm that would scan over all `memops` repeatedly, try to find some `mem_ref` and see which other memops were comparable, and then pack pairs for all of those, by comparing all-vs-all memops. This algorithm is at least quadratic, if not much worse. - I now add all `memops` into a single array, sort them by groups (those that are comparable with each other and could be packed into vectors), and inside the groups by ascending offset. This allows me to split off the groups much more efficiently, and also the sorting by offset allows me finding adjacent pairs much more efficiently. In the most cases this reduces the cost to `O(n log n)` for sort, and a linear scan for finding adjacent memops. - I removed the "alignment boundaries" created in `SuperWord::memory_alignment` by `int off_rem = offset % vw;`. - This used to have the effect that all offsets were computed modulo the vector width. Hence, pairs could not be packed across this boundary (e.g. we have nodes with offsets `31, 32`, which are adjacent in theory, but if we have a `vw = 32`, then the modulo-offsets are `31, 0`, and they are not detected as adjacent). - These "alignment boundaries" used to be required for correctness about a year ago, before I fixed and relaxed much of the alignment code. - The `alignment` used to have another important task: Ensuring compatibility of the input-size of a use node, with the output-size of the def-node. - This was done by giving all nodes an `alignment`, even the non-memop nodes. This `alignment` was then scaled up and down at type casts (e.g. int `0, 4, 8, 12` -> long `0, 8, 16, 24`). If the output-size of the def-node did not match the input-size of the use-node, then the `alignment` would not match up, and we would not pack. - This is why we used to have checks like `alignment(s1) + data_size(s1) == alignment(s2)` and `s2_align == align + data_size(s1)`, and why we did `set_alignment(s2, align + data_size(s1));` inside `SuperWord::set_alignment(Node* s1, Node* s2, int align)`. - I decided to NOT check if use/def type sizes match during packing, but only much later in `SuperWord::profitable` (bad name, it has always been more about checking consistency than profitability, but I will rename that in a Future RFE). The relevant code is in `SuperWord::is_velt_basic_type_compatible_use_def`. ------------- Commit messages: - rm TODO - manual merge - revert a line, need to fix it different - improve comments - fix alignment - fix reductions - MaxI reduction over chars - Merge branch 'master' into JDK-8325155-rm-alignment-boundaries - Merge branch 'master' into JDK-8325155-rm-alignment-boundaries - Merge branch 'master' into JDK-8325155-rm-alignment-boundaries - ... and 14 more: https://git.openjdk.org/jdk/compare/d517d2df...69396ac8 Changes: https://git.openjdk.org/jdk/pull/18822/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18822&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8325155 Stats: 1064 lines in 7 files changed: 597 ins; 369 del; 98 mod Patch: https://git.openjdk.org/jdk/pull/18822.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18822/head:pull/18822 PR: https://git.openjdk.org/jdk/pull/18822 From epeter at openjdk.org Mon May 13 08:03:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 May 2024 08:03:15 GMT Subject: RFR: 8325155: C2 SuperWord: remove alignment boundaries In-Reply-To: References: Message-ID: On Wed, 17 Apr 2024 17:58:53 GMT, Emanuel Peter wrote: > I have tried for a very long time to get rid of all the `alignment(n)` code that is all over the SuperWord code. With lots of previous work, I am now finally ready to remove it. > > I was able to remove lots of VM code, about 300 lines. And the removed code is I think much more complicated than the new code. > > This is what I did in this PR: > - Removal of `_node_info`: used to have many fields, which I refactored out to the `VLoopAnalyzer` modules. `alignment` is the last component, which I now remove. > - Changed the implementation of `SuperWord::find_adjacent_refs`, now `SuperWord::find_adjacent_memop_pairs`, completely: > - It used to be an algorithm that would scan over all `memops` repeatedly, try to find some `mem_ref` and see which other memops were comparable, and then pack pairs for all of those, by comparing all-vs-all memops. This algorithm is at least quadratic, if not much worse. > - I now add all `memops` into a single array, sort them by groups (those that are comparable with each other and could be packed into vectors), and inside the groups by ascending offset. This allows me to split off the groups much more efficiently, and also the sorting by offset allows me finding adjacent pairs much more efficiently. In the most cases this reduces the cost to `O(n log n)` for sort, and a linear scan for finding adjacent memops. > - I removed the "alignment boundaries" created in `SuperWord::memory_alignment` by `int off_rem = offset % vw;`. > - This used to have the effect that all offsets were computed modulo the vector width. Hence, pairs could not be packed across this boundary (e.g. we have nodes with offsets `31, 32`, which are adjacent in theory, but if we have a `vw = 32`, then the modulo-offsets are `31, 0`, and they are not detected as adjacent). > - These "alignment boundaries" used to be required for correctness about a year ago, before I fixed and relaxed much of the alignment code. > - The `alignment` used to have another important task: Ensuring compatibility of the input-size of a use node, with the output-size of the def-node. > - This was done by giving all nodes an `alignment`, even the non-memop nodes. This `alignment` was then scaled up and down at type casts (e.g. int `0, 4, 8, 12` -> long `0, 8, 16, 24`). If the output-size of the def-node did not match the input-size of the use-node, then the `alignment` would not match up, and we would not pack. > - This is why we used to have checks like `alignment(s1) + data_size(s1) == alignment(s2)` ... src/hotspot/share/opto/superword.cpp line 46: > 44: _vloop(vloop_analyzer.vloop()), > 45: _arena(mtCompiler), > 46: _node_info(arena(), _vloop.estimated_body_length(), 0, SWNodeInfo::initial), // info needed per node Note: held the "alignment" info, all other fields were already removed in previous refactorings. src/hotspot/share/opto/superword.cpp line 48: > 46: _clone_map(phase()->C->clone_map()), // map of nodes created in cloning > 47: _pairset(&_arena, _vloop_analyzer), > 48: _packset(&_arena, _vloop_analyzer Note: renamed it to `_mem_ref_for_main_loop_alignment` src/hotspot/share/opto/superword.cpp line 596: > 594: } > 595: } > 596: } Note: this used to count how many "comparable" VPointers we have for each memop. Goal: find memop with the most "comparable" VPointers, in the hope that it is the longest vector. src/hotspot/share/opto/superword.cpp line 675: > 673: > 674: //---------------------------get_vw_bytes_special------------------------ > 675: int SuperWord::get_vw_bytes_special(MemNode* s) { Note: computes "expected" vector width for the memop s. This is based on the `vector_width_in_bytes` but did some special logic for `MulAddS2I`. It also checks the `max_vector_size_in_def_use_chain`. This made sure that the vector width used was not too large, i.e. that there would not be a mismatch of this vector with for example inputs that would require a smaller or larger vector width. All of this seems now obsolete since the I introduced the `split_packs_at_use_def_boundaries` pass. Now, we can simply create the largest vector width that is ok for this memop, and if its use or defs later require a smaller vector width, we simply split this vetor/pack. src/hotspot/share/opto/superword.cpp line 694: > 692: if (!_pairset.is_left(s1) && !_pairset.is_right(s2)) { > 693: if (!s1->is_Mem() || are_adjacent_refs(s1, s2)) { > 694: return true; Note: we still check `are_adjacent_refs`, and non-memops don't need any alignment. src/hotspot/share/opto/superword.cpp line 705: > 703: //---------------------------get_iv_adjustment--------------------------- > 704: // Calculate loop's iv adjustment for this memory ops. > 705: int SuperWord::get_iv_adjustment(MemNode* mem_ref) { Note: was another helper method for `SuperWord::find_adjacent_refs`. Used as the input to `SuperWord::memory_alignment`. The value basically computes how many "elements" this `mem_ref` is away from the "alignment boundary" `offset % vw`. src/hotspot/share/opto/superword.cpp line 718: > 716: // several iterations are needed to align memory operations in main-loop even > 717: // if offset is 0. > 718: int iv_adjustment_in_bytes = (stride_sign * vw - (offset % vw)); Note: the `offset % vw` creates the "alignment boundaries", across which we could not pack any memops. src/hotspot/share/opto/superword.cpp line 921: > 919: continue; > 920: } > 921: if (can_pack_into_pair(t1, t2)) { Note: we now don't check if use/def are compatible with their types here, but in `is_velt_basic_type_compatible_use_def`. src/hotspot/share/opto/superword.cpp line 957: > 955: if (t2->Opcode() == Op_AddI && t2 == cl()->incr()) continue; // don't mess with the iv > 956: if (order_inputs_of_uses_to_match_def_pair(s1, s2, t1, t2) != PairOrderStatus::Ordered) { continue; } > 957: if (can_pack_into_pair(t1, t2)) { Note: we now don't check if use/def are compatible with their types here, but in is_velt_basic_type_compatible_use_def. src/hotspot/share/opto/superword.cpp line 1072: > 1070: if (longer_type_for_conversion(s) != T_ILLEGAL || > 1071: longer_type_for_conversion(t) != T_ILLEGAL) { > 1072: align = align / data_size(s) * data_size(t); Note: this check was there to ensure the type size of use/def nodes matches. This is now done by `is_velt_basic_type_compatible_use_def`. src/hotspot/share/opto/superword.cpp line 1611: > 1609: // the implementation in backend, superword splits the vector implementation > 1610: // for Java API into an execution node with long type plus another node > 1611: // converting long to int. Note: copied this comment from the use-site. This one is important, and I need it inside `is_velt_basic_type_compatible_use_def`. src/hotspot/share/opto/superword.cpp line 2755: > 2753: #endif > 2754: return true; > 2755: } Note: compatibility with `def` used to be checked via alignment, but now we need to check via `is_velt_basic_type_compatible_use_def`. For reductions, we only check the "second" input. src/hotspot/share/opto/superword.cpp line 2785: > 2783: if (!is_velt_basic_type_compatible_use_def(use, u_idx)) { > 2784: return false; > 2785: } Note: this check takes over all the use/def checks that I deleted below. src/hotspot/share/opto/superword.cpp line 2988: > 2986: Node* di = d_pk->at(i); > 2987: if (alignment(ui) != alignment(di) * 2) { > 2988: return false; Note: special case was required for MulAddS2I. src/hotspot/share/opto/superword.cpp line 3007: > 3005: } > 3006: if (alignment(ui) / type2aelembytes(velt_basic_type(ui)) != > 3007: alignment(di) / type2aelembytes(velt_basic_type(di))) { Note: we scaled the alignment by the element size. This allows us the transitions when doing type conversion, i.e. from 4 bytes to 8 bytes. src/hotspot/share/opto/superword.cpp line 3180: > 3178: } > 3179: > 3180: int SuperWord::max_vector_size_in_def_use_chain(Node* n) { Note: was used by `get_vw_bytes_special`. It looks at inputs and outputs of the node `n`, and looks for the largest basic type via `longer_type_for_conversion`. It then returned the max vector size (i.e. number of elements) for that basic type. We can fit fewer large elements in a vector. If we have small elements, we would like to have many elements in a vector. But we must make sure that use and def vectors can have at least as many elements. After I had recently introduced `split_packs_at_use_def_boundaries`, this special logic here is no longer necessary. src/hotspot/share/opto/superword.cpp line 3313: > 3311: //------------------------------memory_alignment--------------------------- > 3312: // Alignment within a vector memory reference > 3313: int SuperWord::memory_alignment(MemNode* s, int iv_adjust) { Note: used to "normalize" the offsets, such that they fit inside a vector. Example: offsets `1000, 1004, 1008, 1012` would be "adjusted" by `1000`, so that it is `0, 4, 8, 12`, and fits in a vector with `16` bytes. If we had `16` byte vectors, and 8 such offsets: `1000, 1004, 1008, 1012, 1016, 1020, 1024, 1028`, this would be split by the modulo `offset % vw` into two sets of `0, 4, 8, 12`, hence, both packs of 4 memops would have these "normalized" offsets. My new approach is just to avoid having the "normalized" offsets all together, and simply work from the "raw" offsets that the VPointer gives us. This is sufficient to determine adjacency. src/hotspot/share/opto/superword.cpp line 3326: > 3324: // We chose an aw that is the maximal possible vector width for the type of > 3325: // align_to_ref. > 3326: const int aw = MAX2(ObjectAlignmentInBytes, vector_width_in_bytes(align_to_ref)); Note: TODO see if we can file a separate bug. src/hotspot/share/opto/superword.cpp line 3331: > 3329: int offset = p.offset_in_bytes(); > 3330: offset += iv_adjust*p.memory_size(); > 3331: int off_rem = offset % vw; Note: this created the "alignment boundaries", by not letting any memops be packed past the vw boundary. src/hotspot/share/opto/superword.hpp line 393: > 391: class SWNodeInfo { > 392: public: > 393: int _alignment; // memory alignment for a node Note: `_alignment` is the last component left of the `SWNodeInfo`, we had already refactored away all other components and moved most of them to the `VLoopAnalylzer` submodules. src/hotspot/share/opto/superword.hpp line 404: > 402: > 403: // Memory reference for which we align the main-loop, by adjusting the pre-loop limit. > 404: MemNode const* _mem_ref_for_main_loop_alignment; Note: replacement for `_align_to_ref` src/hotspot/share/opto/superword.hpp line 512: > 510: // Too verbose for TraceSuperWord > 511: return _vloop.vtrace().is_trace(TraceAutoVectorizationTag::SW_ALIGNMENT); > 512: } Note: All the old verbose tracing is now removed. I now only use `is_trace_superword_adjacent_memops`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1590893258 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1590893657 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1597920375 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1597927247 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1590903980 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1597934968 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1597935680 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1590904863 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1590905149 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1590902729 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1590905995 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1584747015 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1590906802 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1597938750 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1597938336 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1597946835 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1597952454 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1590908823 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1590907960 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1597953541 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1590909210 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1590909954 From galder at openjdk.org Mon May 13 08:15:14 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 13 May 2024 08:15:14 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v15] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 10:04:10 GMT, Dean Long wrote: >> Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix assert to only have a single ! > > src/hotspot/share/c1/c1_GraphBuilder.cpp line 2031: > >> 2029: ciType* type = receiver->exact_type(); >> 2030: if (type != nullptr && type->is_loaded()) { >> 2031: assert(!type->as_instance_klass()->is_interface(), ""); > > Suggestion: > > assert(!type->is_instance_klass() || !type->as_instance_klass()->is_interface(), ""); Thanks @dean-long for the suggested fix. CI looks good now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1598054058 From mli at openjdk.org Mon May 13 08:19:33 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 13 May 2024 08:19:33 GMT Subject: RFR: 8332130: RISC-V: remove wrong instructions of Vector Crypto Extension Message-ID: Hi, Can you help to reivew this simple patch to remove some wrong instrunctions on riscv? Thanks ------------- Commit messages: - Initial commit Changes: https://git.openjdk.org/jdk/pull/19211/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19211&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332130 Stats: 5 lines in 1 file changed: 0 ins; 5 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19211.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19211/head:pull/19211 PR: https://git.openjdk.org/jdk/pull/19211 From luhenry at openjdk.org Mon May 13 08:43:06 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 13 May 2024 08:43:06 GMT Subject: RFR: 8332130: RISC-V: remove wrong instructions of Vector Crypto Extension In-Reply-To: References: Message-ID: On Mon, 13 May 2024 08:14:43 GMT, Hamlin Li wrote: > Hi, > Can you help to reivew this simple patch to remove some wrong instrunctions on riscv? > Thanks What do you mean by wrong? Happy to remove them but we should give some more context. ------------- Marked as reviewed by luhenry (Committer). PR Review: https://git.openjdk.org/jdk/pull/19211#pullrequestreview-2052055354 From tholenstein at openjdk.org Mon May 13 08:46:04 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 13 May 2024 08:46:04 GMT Subject: RFR: 8330584: IGV: XML does not save all node properties [v2] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 12:10:23 GMT, Tobias Holenstein wrote: >> When C2 sends graphs over the network to IGV, each graph is sent separately. The same applies if C2 saves graphs to XML: each graph is saved with all it's nodes as a separate `...` in the XML >> >> To save space, graphs that are saved from IGV only contains the incremental difference for each graph. This saves a lot of space (~5-10x). The logic happens in Printer.java -> `exportInputGraph(.., difference=true, ...)` Unfortunately, there is a bug in this logic: the properties of the nodes are not saved correctly. >> >> [graphs.zip](https://github.com/openjdk/jdk/files/15220940/graphs.zip) contains 4 graphs: >> >> `graph_c2.xml` (230KB) - a XML saved from C2 >> `graph_igv_bug.xml` (73KB) - opened `graph_c2.xml` in IGV (without this fix) and save as `graph_igv_bug.xml`. >> `graph_igv_fixed.xml` (123KB) - opened `graph_c2.xml` in IGV (with this fix) and save as `graph_igv_fixed.xml `. >> >> As you can see `graph_igv_fixed.xml` is twice as large as `graph_igv_bug.xml` because it contains the missing properties. But now the memory saving from the original `graph_c2.xml` is only ~2x. >> Therefore a new format for saving is added: graphs can now be saved and opened from IGV as `.igv`. This uses a compressed (ZIP) format. >> >> `graph.igv` (10KB) is the same graph as `graph_c2.xml` (230KB). But it uses difference graph compression and ZIP compression and is in total 23x smaller in memory footprint. >> >> >> >> E.g. The root in the last graph of difference_true.xml has way less properties than in difference_false.xml. > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > Update src/utils/IdealGraphVisualizer/Coordinator/src/main/java/com/sun/hotspot/igv/coordinator/OutlineTopComponent.java > > Co-authored-by: Roberto Casta?eda Lozano > > Just a general thought: Should we generally only save in .igv format and drop (explicit) saving in XML format or is there any benefit to be able to store in both formats? > > I find the explicit XML format convenient sometimes for debugging something or doing a quick plain-text search. > > Just a general thought: Should we generally only save in .igv format and drop (explicit) saving in XML format or is there any benefit to be able to store in both formats? > > I find the explicit XML format convenient sometimes for debugging something or doing a quick plain-text search. I don't mind keeping both formats. As a side note: unzip graph.igv gives you `difference.xml` as well ------------- PR Comment: https://git.openjdk.org/jdk/pull/19104#issuecomment-2106991391 From gli at openjdk.org Mon May 13 09:03:16 2024 From: gli at openjdk.org (Guoxiong Li) Date: Mon, 13 May 2024 09:03:16 GMT Subject: RFR: 8332032: C2: Remove ExpandSubTypeCheckAtParseTime flag In-Reply-To: References: Message-ID: On Mon, 13 May 2024 00:59:18 GMT, xiaotaonan wrote: > @lgxbslgx The reviewers (maybe me) of the corresponding area will review your patch. So I don't think you need to CC me especially. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19205#issuecomment-2107026125 From chagedorn at openjdk.org Mon May 13 09:18:15 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 13 May 2024 09:18:15 GMT Subject: RFR: 8332032: C2: Remove ExpandSubTypeCheckAtParseTime flag In-Reply-To: References: <_VqdctFR4arGmdWQk9opoNMe6h1Rwa0gKDWHEcHyO9Y=.2ea3c35b-660d-4431-bc47-e0a874c386ce@github.com> Message-ID: On Mon, 13 May 2024 07:03:44 GMT, xiaotaonan wrote: > > please first ask in JBS if you can take over RFEs/bugs that are already assigned like this one, especially if it has just been filed. This PR misses the entire context why this flag should be removed and what the pros/cons and trade-offs are. I planned to do some more offline discussions first before proposing the actual PR to remove this flag since it is now related to an otherwise hard-to-fix bug in Valhalla. > > OK. Thanks for your understanding and letting me taking this PR over - I will propose this change again later this week (we first also need to update some internal stress jobs that use this flag). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19205#issuecomment-2107059329 From tholenstein at openjdk.org Mon May 13 09:18:10 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 13 May 2024 09:18:10 GMT Subject: RFR: 8330584: IGV: XML does not save all node properties [v2] In-Reply-To: References: Message-ID: On Fri, 10 May 2024 09:37:44 GMT, Christian Hagedorn wrote: >> Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/utils/IdealGraphVisualizer/Coordinator/src/main/java/com/sun/hotspot/igv/coordinator/OutlineTopComponent.java >> >> Co-authored-by: Roberto Casta?eda Lozano > > Just a general thought: Should we generally only save in `.igv` format and drop (explicit) saving in XML format or is there any benefit to be able to store in both formats? thanks for the reviews @chhagedorn and @robcasloz ------------- PR Comment: https://git.openjdk.org/jdk/pull/19104#issuecomment-2107054910 From tholenstein at openjdk.org Mon May 13 09:18:14 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 13 May 2024 09:18:14 GMT Subject: Integrated: 8330584: IGV: XML does not save all node properties In-Reply-To: References: Message-ID: On Mon, 6 May 2024 12:06:20 GMT, Tobias Holenstein wrote: > When C2 sends graphs over the network to IGV, each graph is sent separately. The same applies if C2 saves graphs to XML: each graph is saved with all it's nodes as a separate `...` in the XML > > To save space, graphs that are saved from IGV only contains the incremental difference for each graph. This saves a lot of space (~5-10x). The logic happens in Printer.java -> `exportInputGraph(.., difference=true, ...)` Unfortunately, there is a bug in this logic: the properties of the nodes are not saved correctly. > > [graphs.zip](https://github.com/openjdk/jdk/files/15220940/graphs.zip) contains 4 graphs: > > `graph_c2.xml` (230KB) - a XML saved from C2 > `graph_igv_bug.xml` (73KB) - opened `graph_c2.xml` in IGV (without this fix) and save as `graph_igv_bug.xml`. > `graph_igv_fixed.xml` (123KB) - opened `graph_c2.xml` in IGV (with this fix) and save as `graph_igv_fixed.xml `. > > As you can see `graph_igv_fixed.xml` is twice as large as `graph_igv_bug.xml` because it contains the missing properties. But now the memory saving from the original `graph_c2.xml` is only ~2x. > Therefore a new format for saving is added: graphs can now be saved and opened from IGV as `.igv`. This uses a compressed (ZIP) format. > > `graph.igv` (10KB) is the same graph as `graph_c2.xml` (230KB). But it uses difference graph compression and ZIP compression and is in total 23x smaller in memory footprint. > > > > E.g. The root in the last graph of difference_true.xml has way less properties than in difference_false.xml. This pull request has now been integrated. Changeset: 391bbbc7 Author: Tobias Holenstein URL: https://git.openjdk.org/jdk/commit/391bbbc7d0fb95b0cd55e2f56c43bee019aeab7f Stats: 147 lines in 3 files changed: 79 ins; 16 del; 52 mod 8330584: IGV: XML does not save all node properties Reviewed-by: rcastanedalo, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/19104 From mli at openjdk.org Mon May 13 09:51:11 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 13 May 2024 09:51:11 GMT Subject: RFR: 8332130: RISC-V: remove wrong instructions of Vector Crypto Extension In-Reply-To: References: Message-ID: On Mon, 13 May 2024 08:40:08 GMT, Ludovic Henry wrote: > What do you mean by wrong? Happy to remove them but we should give some more context. Thanks, update the pr desc to explain why they're wrong. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19211#issuecomment-2107128837 From bkilambi at openjdk.org Mon May 13 10:27:14 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 13 May 2024 10:27:14 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v8] In-Reply-To: References: <8-_t7nWbR9gZ2_QkfFNuf5M0Q4PMkKJKgwS3ZbHcCxI=.32dc4f11-dec5-468d-afc8-3b4dae285dcb@github.com> Message-ID: <2y-Ag6MxVDJfYl6kM0FYjQA-kzSCekUgAMWAZmkECyQ=.2a2a0a8e-fc67-42a4-bd67-b4ae3b60bcea@github.com> On Mon, 13 May 2024 05:58:17 GMT, Emanuel Peter wrote: >>> I just realized that there is no regression test. And I think it would be nice to have one. >>> >>> Also, we should add some sort of message to the `dump` if the `ReductionNode` has the `requires_strict_order` on or off. I think that could be done in `dump_spec`. >>> >>> You could do it similar to: >>> >>> ``` >>> #ifndef PRODUCT >>> void VectorMaskCmpNode::dump_spec(outputStream *st) const { >>> st->print(" %d #", _predicate); _type->dump_on(st); >>> } >>> #endif // PRODUCT >>> ``` >>> >>> This would actually allow you to create a IR test! >>> >>> You would check that the AddReductionVNode is annotated correctly. You need some VectorAPI tests, and some SuperWord auto-vectorization tests. >>> >>> How does that sound? That would ensure that nobody can easily destroy your RFE, at least not in the IR. >> >> Hi @eme64 , thanks for the suggestion. I can add the `dump_spec` as suggested (which would print if the `_requires_strict_order` flag is enabled/disabled) but I am not sure if I fully understand what's expected in the JTREG tests. Should I be verifying the `-XX:+PrintIdeal` output to make sure the correct message is being printed for the `ReductionV*` nodes? > >> I am not sure if I fully understand what's expected in the JTREG tests. Should I be verifying the -XX:+PrintIdeal output to make sure the correct message is being printed for the ReductionV* nodes? > > Yes, the IR framework basically does regex matching against the PrintIdeal graph. For example: `counts = {IRNode.STORE_VECTOR, ">0"}` in the `@IR` rule executes the regex for the store vector, and checks if we find more than zero occurances. > > Maybe you can just use a regex string directly for your special IR rule. Alternatively, you could have them in the `IRNode` class, but not sure that's worth it. @eme64 Thanks for the clarification. I understand the usage of `counts` in the IR tests. Just that I got a bit confused by some of your earlier statements. We do actually have a test to make sure AddReductionVF/VD and MulReductionVF/VD are not generated on aarch64 NEON machines - `test/hotspot/jtreg/compiler/c2/irTests/TestDisableAutoVectOpcodes.java`. I can modify this test to include UseSVE > 0 case as well and will also add a separate JTREG test for the VectorAPI tests. Hope that's ok.. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18034#issuecomment-2107199006 From yzheng at openjdk.org Mon May 13 10:32:51 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 13 May 2024 10:32:51 GMT Subject: RFR: 8331429: [JVMCI] Cleanup JVMCIRuntime allocation routines [v2] In-Reply-To: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> References: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> Message-ID: > This PR removes allocation routines that may throw exception from JVMCIRuntime. It also exports various symbols related to the hashed secondary supers table. Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: address comment. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19176/files - new: https://git.openjdk.org/jdk/pull/19176/files/2c688ece..82f0e0d0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19176&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19176&range=00-01 Stats: 19 lines in 3 files changed: 2 ins; 6 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/19176.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19176/head:pull/19176 PR: https://git.openjdk.org/jdk/pull/19176 From yzheng at openjdk.org Mon May 13 10:32:51 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 13 May 2024 10:32:51 GMT Subject: RFR: 8331429: [JVMCI] Cleanup JVMCIRuntime allocation routines [v2] In-Reply-To: References: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> <9wsX9310p38cnuPHGU4xKirWfyfYR6cICO6iPhnDk5Y=.55d9503f-2cc8-4c26-b24f-2ced7f8f72f5@github.com> Message-ID: On Sat, 11 May 2024 09:06:20 GMT, Doug Simon wrote: >> Only for the HAS_PENDING_EXCEPTION case. What about the !h->is_initialized() case? > > Good observation - seems like this is an outstanding bug. Can you please address that Yudi. > In practice, I wonder how much this matters as Graal always [clears the object result](https://github.com/oracle/graal/blob/0b61d20b08b1af76bd35cfb673c7be8d33855f51/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/hotspot/stubs/ForeignCallSnippets.java#L127) after reading it. Good point ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19176#discussion_r1598245624 From epeter at openjdk.org Mon May 13 11:04:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 May 2024 11:04:18 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v8] In-Reply-To: <2y-Ag6MxVDJfYl6kM0FYjQA-kzSCekUgAMWAZmkECyQ=.2a2a0a8e-fc67-42a4-bd67-b4ae3b60bcea@github.com> References: <8-_t7nWbR9gZ2_QkfFNuf5M0Q4PMkKJKgwS3ZbHcCxI=.32dc4f11-dec5-468d-afc8-3b4dae285dcb@github.com> <2y-Ag6MxVDJfYl6kM0FYjQA-kzSCekUgAMWAZmkECyQ=.2a2a0a8e-fc67-42a4-bd67-b4ae3b60bcea@github.com> Message-ID: On Mon, 13 May 2024 10:22:12 GMT, Bhavana Kilambi wrote: >>> I am not sure if I fully understand what's expected in the JTREG tests. Should I be verifying the -XX:+PrintIdeal output to make sure the correct message is being printed for the ReductionV* nodes? >> >> Yes, the IR framework basically does regex matching against the PrintIdeal graph. For example: `counts = {IRNode.STORE_VECTOR, ">0"}` in the `@IR` rule executes the regex for the store vector, and checks if we find more than zero occurances. >> >> Maybe you can just use a regex string directly for your special IR rule. Alternatively, you could have them in the `IRNode` class, but not sure that's worth it. > > @eme64 Thanks for the clarification. I understand the usage of `counts` in the IR tests. Just that I got a bit confused by some of your earlier statements. We do actually have a test to make sure AddReductionVF/VD and MulReductionVF/VD are not generated on aarch64 NEON machines - `test/hotspot/jtreg/compiler/c2/irTests/TestDisableAutoVectOpcodes.java`. I can modify this test to include UseSVE > 0 case as well and will also add a separate JTREG test for the VectorAPI tests. Hope that's ok.. @Bhavana-Kilambi I know we have the tests in `test/hotspot/jtreg/compiler/c2/irTests/TestDisableAutoVectOpcodes.java`, and some other reduction tests. But these do not do the specific think I would like to see. I would like this: - Add `no_strict_order` vs `requires_strict_order` or similar to `dump_spec`. - IR match not just that there is the correct `ReductionNode`, but also that it has the `no_strict_order` or `requires_strict_order` in its dump. You can do that by using a custom regex string, rather than `IRNode.STORE_VECTOR` or similar. - Then, create different tests, some where we expect ordered, some unordered vectors. Use Vector API and SuperWord examples. Does that make sense? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18034#issuecomment-2107276021 From yzheng at openjdk.org Mon May 13 11:34:18 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 13 May 2024 11:34:18 GMT Subject: RFR: 8331429: [JVMCI] Cleanup JVMCIRuntime allocation routines [v3] In-Reply-To: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> References: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> Message-ID: > This PR removes allocation routines that may throw exception from JVMCIRuntime. It also exports various symbols related to the hashed secondary supers table. Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: remove trailing white space ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19176/files - new: https://git.openjdk.org/jdk/pull/19176/files/82f0e0d0..0a638521 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19176&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19176&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19176.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19176/head:pull/19176 PR: https://git.openjdk.org/jdk/pull/19176 From bkilambi at openjdk.org Mon May 13 12:10:10 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 13 May 2024 12:10:10 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v8] In-Reply-To: <2y-Ag6MxVDJfYl6kM0FYjQA-kzSCekUgAMWAZmkECyQ=.2a2a0a8e-fc67-42a4-bd67-b4ae3b60bcea@github.com> References: <8-_t7nWbR9gZ2_QkfFNuf5M0Q4PMkKJKgwS3ZbHcCxI=.32dc4f11-dec5-468d-afc8-3b4dae285dcb@github.com> <2y-Ag6MxVDJfYl6kM0FYjQA-kzSCekUgAMWAZmkECyQ=.2a2a0a8e-fc67-42a4-bd67-b4ae3b60bcea@github.com> Message-ID: <1G5vZYJlb_DYSjClQiGKulCfT-lk5wi3GXkoy1mBSh0=.f7004c63-b303-42bb-8104-40929931f4d6@github.com> On Mon, 13 May 2024 10:22:12 GMT, Bhavana Kilambi wrote: >>> I am not sure if I fully understand what's expected in the JTREG tests. Should I be verifying the -XX:+PrintIdeal output to make sure the correct message is being printed for the ReductionV* nodes? >> >> Yes, the IR framework basically does regex matching against the PrintIdeal graph. For example: `counts = {IRNode.STORE_VECTOR, ">0"}` in the `@IR` rule executes the regex for the store vector, and checks if we find more than zero occurances. >> >> Maybe you can just use a regex string directly for your special IR rule. Alternatively, you could have them in the `IRNode` class, but not sure that's worth it. > > @eme64 Thanks for the clarification. I understand the usage of `counts` in the IR tests. Just that I got a bit confused by some of your earlier statements. We do actually have a test to make sure AddReductionVF/VD and MulReductionVF/VD are not generated on aarch64 NEON machines - `test/hotspot/jtreg/compiler/c2/irTests/TestDisableAutoVectOpcodes.java`. I can modify this test to include UseSVE > 0 case as well and will also add a separate JTREG test for the VectorAPI tests. Hope that's ok.. > @Bhavana-Kilambi I know we have the tests in `test/hotspot/jtreg/compiler/c2/irTests/TestDisableAutoVectOpcodes.java`, and some other reduction tests. But these do not do the specific think I would like to see. > > I would like this: > > * Add `no_strict_order` vs `requires_strict_order` or similar to `dump_spec`. > > * IR match not just that there is the correct `ReductionNode`, but also that it has the `no_strict_order` or `requires_strict_order` in its dump. You can do that by using a custom regex string, rather than `IRNode.STORE_VECTOR` or similar. > > * Then, create different tests, some where we expect ordered, some unordered vectors. Use Vector API and SuperWord examples. > > > Does that make sense? Yes, I am doing exactly that. Just that for the superword(auto-vec) case, I am just modifying the AddReduction related tests in `TestDisableAutoVectOpcodes.java` to incorporate the case with UseSVE > 0 as well and match the regex as per the dump_spec output. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18034#issuecomment-2107401404 From liach at openjdk.org Mon May 13 12:15:10 2024 From: liach at openjdk.org (Chen Liang) Date: Mon, 13 May 2024 12:15:10 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v8] In-Reply-To: <8bkIrXCl7OsuLoMQi43faVELq0d1R-P60pSCGkxpwpU=.fe207403-8288-4f2d-ab7d-96fec5ba212e@github.com> References: <3eERzqYdCd4f9qn4KpzBA9ealaUTzC67wIhzB18ETTE=.f9d17a6f-1ca5-477f-8344-40c20abe7d7e@github.com> <8bkIrXCl7OsuLoMQi43faVELq0d1R-P60pSCGkxpwpU=.fe207403-8288-4f2d-ab7d-96fec5ba212e@github.com> Message-ID: On Mon, 13 May 2024 07:51:19 GMT, Adam Sotona wrote: >> src/java.base/share/classes/java/lang/classfile/Attributes.java line 153: >> >>> 151: >>> 152: /** >>> 153: * {@return Attribute mapper for the {@code AnnotationDefault} attribute} >> >> Just wondering, can we change `{@code AnnotationDefault}` to `{@value #NAME_ANNOTATION_DEFAULT}`, etc? This way, the names are still rendered as code in Javadoc HTML, but they are generated with links to the constants, and programmers will see these constants and prefer them over hardcoded values. > > On the other side it is questionable if the attribute names should be exposed in the API. We provide corresponding mappers and attribute models. I don't see a case where user would need to use the attribute names directly. Makes sense, we can always add these literals back if we do need them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19006#discussion_r1598368707 From eastigeevich at openjdk.org Mon May 13 13:11:15 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 13 May 2024 13:11:15 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives Message-ID: Backout of [JDK-8309271](https://bugs.openjdk.org/browse/JDK-8309271) which has known bugs, possible bugs and performance issues. Found bugs: - When refreshing `CompilerDirectivesAddDCmd::execute` will call `DirectivesStack::hasMatchingDirectives(mh, true)` which only considers the compiler directive which is on the top of the directives stack. As more than one directive can be added, `CompilerDirectivesAddDCmd::execute` will not behave as expected. - A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. There are other concerns: bugs and performance issues. Possible bugs: - `has_matching_directives` might not be cleared. A nmethod might get into the unloading state before `CodeCache::recompile_marked_directives_matches`. If the nmethod has been used to mark a Java method and it is the only nmethod, there will be no nmethod in CodeCache to reach the Java method to clear the mark. - A Java method might have been compiled with new directives before `CodeCache::recompile_marked_directives_matches`. `CodeCache::recompile_marked_directives_matches` will recompile it again. - JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. Performance issues: - Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, `CodeCache::recompile_marked_directives_matches` will be traversing nmethods most of which don't need recompilation. The backout is not clean because of removal of `CompiledMethod`. Tested with release and fastdebug builds: tier1 and tier2 passed. ------------- Commit messages: - 8332111: [BACKOUT] A way to align already compiled methods with compiler directives Changes: https://git.openjdk.org/jdk/pull/19215/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19215&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332111 Stats: 380 lines in 15 files changed: 3 ins; 347 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/19215.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19215/head:pull/19215 PR: https://git.openjdk.org/jdk/pull/19215 From shade at openjdk.org Mon May 13 13:21:05 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 13 May 2024 13:21:05 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: <_Hf9ur_fzBA6MysoCZHn7KAjJwC0ubP8v4SKBvethOw=.63d58c21-c8ef-4b5a-b878-7fd330e0d654@github.com> On Mon, 13 May 2024 13:03:26 GMT, Evgeny Astigeevich wrote: > Backout of [JDK-8309271](https://bugs.openjdk.org/browse/JDK-8309271) which has known bugs, possible bugs and performance issues. > > Found bugs: > - When refreshing `CompilerDirectivesAddDCmd::execute` will call `DirectivesStack::hasMatchingDirectives(mh, true)` which only considers the compiler directive which is on the top of the directives stack. As more than one directive can be added, `CompilerDirectivesAddDCmd::execute` will not behave as expected. > - A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. > > There are other concerns: bugs and performance issues. > > Possible bugs: > - `has_matching_directives` might not be cleared. A nmethod might get into the unloading state before `CodeCache::recompile_marked_directives_matches`. If the nmethod has been used to mark a Java method and it is the only nmethod, there will be no nmethod in CodeCache to reach the Java method to clear the mark. > - A Java method might have been compiled with new directives before `CodeCache::recompile_marked_directives_matches`. `CodeCache::recompile_marked_directives_matches` will recompile it again. > - JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. > > Performance issues: > - Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, `CodeCache::recompile_marked_directives_matches` will be traversing nmethods most of which don't need recompilation. > > The backout is not clean because of removal of `CompiledMethod`. > > Tested with release and fastdebug builds: tier1 and tier2 passed. The reversal looks fine. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19215#pullrequestreview-2052683089 From roland at openjdk.org Mon May 13 13:23:46 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 May 2024 13:23:46 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop [v2] In-Reply-To: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> Message-ID: <_kbcMydcMPblcm_FDDuL5vWGT7q6iRoarmYsTlEA0hQ=.290c6744-211d-406d-8ed1-90e510051167@github.com> > In the test case: > > > long i; > for (; i > 0; i--) { > res += 42 / ((int) i); > > > The long counted loop phi has type `[1..100]`. As a consequence, the > `ConvL2I` also has type `[1..100]`. The `DivI` node that follows can't > fault: it is not guarded by a zero check and has no control set. > > The `ConvL2I` is split through phi and so is the `DiVI` node: > `PhaseIdealLoop::cannot_split_division()` returns true because the > value coming from the backedge into the `DivI` (when it is about to be > split thru phi) is the result of the `ConvL2I` which has type > `[1..100`] so is not zero as far as the compiler can tell. > > On the last iteration of the loop, i is 1. Because the DivI was split > thru Phi, it computes the value for the following iteration, so for i > = 0. This causes a crash when the compiled code runs. > > The same problem can't happen with an int counted loop because logic > in `PhaseIdealLoop::split_thru_phi()` prevents a `ConvI2L` from being > split thru phi. I propose to fix this the same way: in the test case, > it's not true that once the `ConvL2I` is split thru phi it keeps type > `[1..100]`. The fix is fairly conservative because it's base on the > existing logic for `ConvI2L`: we would want to not split a `ConvL2I` > only a counted loopd but. I suppose the same is true for the `ConvI2L` > and I thought it would be best to revisit both together. Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: - test case tweaks - fuzzer test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19086/files - new: https://git.openjdk.org/jdk/pull/19086/files/d48443c3..3c417dc2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19086&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19086&range=00-01 Stats: 63 lines in 2 files changed: 61 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19086.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19086/head:pull/19086 PR: https://git.openjdk.org/jdk/pull/19086 From roland at openjdk.org Mon May 13 13:23:46 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 May 2024 13:23:46 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop [v2] In-Reply-To: References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> Message-ID: On Mon, 6 May 2024 07:35:56 GMT, Christian Hagedorn wrote: > You could also add the regression tests from the duplicated issue [JDK-8298851](https://bugs.openjdk.org/browse/JDK-8298851). I added one of them because it doesn't seem to need `StressGCM`. Does it really make sense to add all of them? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19086#issuecomment-2107563511 From roland at openjdk.org Mon May 13 13:23:46 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 May 2024 13:23:46 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop [v2] In-Reply-To: References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> <2lreoMy7UKtgM_m8RCU68rp3FFkoU8zj3ckuTKzXqf0=.dc02a0d4-2671-4c70-a470-a64f28e38f2d@github.com> Message-ID: <2t3peiZ70K4xcs0LhocSx5jWPVlRns_dEp52j2uwJWk=.432a5285-8197-44c5-b308-9c9a2b602c79@github.com> On Wed, 8 May 2024 07:10:25 GMT, Christian Hagedorn wrote: >> test/hotspot/jtreg/compiler/splitif/TestLongCountedLoopConvL2I.java line 31: >> >>> 29: * -XX:+StressGCM -XX:StressSeed=92643864 TestLongCountedLoopConvL2I >>> 30: * @run main/othervm -XX:-BackgroundCompilation -XX:-TieredCompilation -XX:-UseOnStackReplacement >>> 31: * -XX:+StressGCM TestLongCountedLoopConvL2I >> >> Would it make sense to have a run that allows OSR? > > You should also add `-XX:+UnlockDiagnosticVMOptions` for the stress flag. Done in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19086#discussion_r1598467494 From roland at openjdk.org Mon May 13 13:27:08 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 May 2024 13:27:08 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop [v2] In-Reply-To: <2lreoMy7UKtgM_m8RCU68rp3FFkoU8zj3ckuTKzXqf0=.dc02a0d4-2671-4c70-a470-a64f28e38f2d@github.com> References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> <2lreoMy7UKtgM_m8RCU68rp3FFkoU8zj3ckuTKzXqf0=.dc02a0d4-2671-4c70-a470-a64f28e38f2d@github.com> Message-ID: On Tue, 7 May 2024 17:05:45 GMT, Emanuel Peter wrote: > I guess the issue is that ConvL2I and ConvI2L are also type nodes, which can restrict their type, just like CastII nodes. And that restricting of the type is only true under a certain if-branch. That's not entirely true here. The `ConvL2I` captures the type of its input so not a narrower type. The problem is that the type is that of a `Phi` for a counted loop and once pushed through phi, the type captured by the `ConvI2L` becomes incorrect. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19086#issuecomment-2107569510 From roland at openjdk.org Mon May 13 13:27:09 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 May 2024 13:27:09 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop [v2] In-Reply-To: References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> Message-ID: On Tue, 7 May 2024 17:25:49 GMT, Christian Hagedorn wrote: > It also seems that it's only a problem with loop iv phis because we improve the iv type in such a way that some of the possible values of the backedge are excluded. So, maybe a first step could be to allow splitting the `Conv*` nodes through non-loop-iv phi nodes. However, there might also be other non-loop-iv phi problems I'm currently not aware of. Nevertheless, it might be worth to investigate further in a separate RFE. I agree that it would be worth investigating further. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19086#discussion_r1598474092 From luhenry at openjdk.org Mon May 13 13:29:03 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 13 May 2024 13:29:03 GMT Subject: RFR: 8332130: RISC-V: remove wrong instructions of Vector Crypto Extension In-Reply-To: References: Message-ID: On Mon, 13 May 2024 08:14:43 GMT, Hamlin Li wrote: > Hi, > Can you help to reivew this simple patch to remove some wrong instrunctions on riscv? > These instrunctions are wrong in that e.g. take `vror.vx` as example, > * by definition of spec, it should be `vror.vx vd, vs2, *rs1*, vm` > * the implementation here, it is indeed `vror_vx(VectorRegister Vd, VectorRegister Vs2, *VectorRegister* Vs1, VectorMask vm = unmasked)` > > Thanks Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19211#pullrequestreview-2052703762 From roland at openjdk.org Mon May 13 13:40:17 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 May 2024 13:40:17 GMT Subject: RFR: 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode [v2] In-Reply-To: <7b3qt72dd5rV6nirPQILkqTMleDRMRYuXlKpqVVVpyo=.c2ed3889-cb43-4576-9d63-de133152b7fb@github.com> References: <_8csQpQVHlNpwenIT4H7OFkMSOaU6Fz-ZmJ0Yi6ArLU=.0b84b78d-4637-49ab-b43f-4c457498b0ce@github.com> <7b3qt72dd5rV6nirPQILkqTMleDRMRYuXlKpqVVVpyo=.c2ed3889-cb43-4576-9d63-de133152b7fb@github.com> Message-ID: On Tue, 7 May 2024 17:29:02 GMT, Christian Hagedorn wrote: > But concepttionally, we want to get these nodes to be removed and the Initialized Assertion Predicates folded once we know that we no longer split loops (i.e. in post loop IGVN). I don't think that's quite correct. Any round of igvn could cause the bounds of a counted loop to change in a way that conflicts with the types captured in the `CastII`/`ConvI2L` nodes. I think that's true even after loop optimizations are over. As a consequence, we want the Assertion Predicates to fold as late as possible. That's poorly tested currently because we emit the predicates in compiled code for debug builds so, in practice, we never really remove them. As part of this change, I wouldn't change that behavior. That seems risky. >> src/hotspot/share/opto/opaquenode.hpp line 138: >> >>> 136: // to true. Therefore, we get rid of them in product builds as they are useless. In debug builds we keep them as >>> 137: // additional verification code (i.e. removing this node and use the BoolNode input instead). >>> 138: class OpaqueInitializedAssertionPredicateNode : public Node { >> >> Shouldn't the new OpaqueInitializedAssertionPredicateNode be a subclass of Opaque4 or shouldn't both be a subclass of a common super type? Don't they share at least some logic or behavior? > > I first thought about reusing this class in some way. But the second input is actually not needed. We could move forward and just remove the second input for `Opaque4` nodes (it's always a true constant). But I still wanted to have an easy way to have a distinguishable node from the other uses of the `Opaque4` nodes in non-null checks. > > Furthermore, I think sub classing the `Opaque4` class can be problematic when doing `is_Opaque4()` since we sometimes expect an `Opaque4` only and sometimes an `OpaqueInitializedAssertionPredicate` only and sometimes both are fine. I think it's cleaner to have two separate classes instead of sub classing each other. > > What do you think? Fair enough. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18951#discussion_r1598493508 PR Review Comment: https://git.openjdk.org/jdk/pull/18951#discussion_r1598494163 From kxu at openjdk.org Mon May 13 13:46:36 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 13 May 2024 13:46:36 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v4] In-Reply-To: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: > Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. > > Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 22 additional commits since the last revision: - add more expressive comments and test cases - Merge branch 'master' into long-typed-parallel-iv - update comments to clarify on type casting - add pseudocode for subgraphs before/after the transformation - remove WIP support for long counted loops - Merge branch 'master' into long-typed-parallel-iv - update tests - update tests - update tests - clean up code for pr - ... and 12 more: https://git.openjdk.org/jdk/compare/1ecc282b...85820dee ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18489/files - new: https://git.openjdk.org/jdk/pull/18489/files/dcd55681..85820dee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=02-03 Stats: 122774 lines in 3145 files changed: 56800 ins; 49870 del; 16104 mod Patch: https://git.openjdk.org/jdk/pull/18489.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18489/head:pull/18489 PR: https://git.openjdk.org/jdk/pull/18489 From kxu at openjdk.org Mon May 13 13:46:37 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 13 May 2024 13:46:37 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v3] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: On Thu, 18 Apr 2024 09:11:13 GMT, Emanuel Peter wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> update comments to clarify on type casting > > test/hotspot/jtreg/compiler/c2/irTests/TestCountedLoopIV.java line 24: > >> 22: */ >> 23: >> 24: package compiler.c2.irTests; > > Putting IR tests into the `irTests` directory is what we did at the beginning, when we assumed IR tests would not be widely adopted. But now it makes more sense to put this test where it belongs "thematically". I suggest you put it under `compiler/loopopts`, or even in a new subdirectory: `compiler/loopopts/parallel_iv`. > > Also the name of this test could be more expressive: `TestLongParallelIvInIntCountedLoop.java` Renamed to `compiler.loopopts.parallel_iv.TestParallelIvInIntCountedLoop` Notice I chose *Test~Long~ParallelIvInIntCountedLoop* since it also tests int IVs. > test/hotspot/jtreg/compiler/c2/irTests/TestCountedLoopIV.java line 63: > >> 61: int a = 0; >> 62: for (int i = 0; i < stop; i++) { >> 63: a += 0; // we unfortunately have to repeat ourselves because the operand has to be a constant > > I don't understand your comment. Why is this test interesting? The IR framework can only test against static code, and the transformation relies on strides being constants to perform constant propagation. Therefore, we have no choice but repeating the same test case multiple times with different numbers. I added comments to clarify this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1598496880 PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1598500281 From kxu at openjdk.org Mon May 13 13:46:37 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 13 May 2024 13:46:37 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v3] In-Reply-To: <0YMwCJtOCiJU6gDibC6awo-iowi3wFuOKPM32sHkGRA=.34e4fec1-ffb9-4ac8-ac2e-35a1c9494020@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> <8zWNeJWcumovt4jcMMCbbhfQJVKDypVM2nR6xRUGx3U=.760cf413-7503-4ab2-a2c2-955f430ee4b4@github.com> <0YMwCJtOCiJU6gDibC6awo-iowi3wFuOKPM32sHkGRA=.34e4fec1-ffb9-4ac8-ac2e-35a1c9494020@github.com> Message-ID: <9SZeJL0GoHL2XzCiyK_zNPTUT9az48DwBou9s5kFI2k=.e8d2d629-67db-4459-bf7e-d12e9435f043@github.com> On Thu, 18 Apr 2024 09:38:20 GMT, Emanuel Peter wrote: >> And why no IR rules for these? > > You definately need more tests with IR rules. Those functions were only called in `testCorrectness()` and excluded from IR verifications. They are not included. >> Generally, it would be nice if you had more cases where we are checking overflows. > > And some with negative strides would be great too. More tests added. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1598500232 PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1598500700 From roland at openjdk.org Mon May 13 13:48:25 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 May 2024 13:48:25 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v16] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 05:04:38 GMT, Galder Zamarre?o wrote: >> Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. >> >> The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: >> >> >> $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op >> ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op >> ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op >> ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op >> ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op >> ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op >> ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op >> ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op >> ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op >> ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op >> ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op >> ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op >> >> >> It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. >> >> I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. >> >> >> $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> ... >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 >> >> >> One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? >> >>... > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/c1/c1_GraphBuilder.cpp > > Co-authored-by: Dean Long <17332032+dean-long at users.noreply.github.com> Otherwise, looks good to me. src/hotspot/share/c1/c1_GraphBuilder.cpp line 2031: > 2029: ciType* type = receiver->exact_type(); > 2030: if (type != nullptr && type->is_loaded()) { > 2031: assert(!type->is_instance_klass() || !type->as_instance_klass()->is_interface(), ""); Please add a message to the assert. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17667#pullrequestreview-2052754875 PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1598505645 From kxu at openjdk.org Mon May 13 13:54:05 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 13 May 2024 13:54:05 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v3] In-Reply-To: <0YMwCJtOCiJU6gDibC6awo-iowi3wFuOKPM32sHkGRA=.34e4fec1-ffb9-4ac8-ac2e-35a1c9494020@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> <8zWNeJWcumovt4jcMMCbbhfQJVKDypVM2nR6xRUGx3U=.760cf413-7503-4ab2-a2c2-955f430ee4b4@github.com> <0YMwCJtOCiJU6gDibC6awo-iowi3wFuOKPM32sHkGRA=.34e4fec1-ffb9-4ac8-ac2e-35a1c9494020@github.com> Message-ID: On Thu, 18 Apr 2024 09:32:19 GMT, Emanuel Peter wrote: >> Can you also be consistent with the names all the way through your comments? I suggest you just only use `stride_con`, and not `stride`. You can use `i` and `a`, if you want. But then it would be helpful if you had two lines with identical expressions, but where you make the transition from `i` to `phi`. > > Ah. It seems that we require `stride2 / stride` to be a lossless division in the code. A comment about that limitation would be helpful. And I think you should also check if there are tests that cover cases where the division would be lossy. > ...be consistent with the names... I had this concern, especially `i` vs `phi`. I didn't think it was reasonable to call the iterator `phi` only because the optimization code calls such a value so by extracting from the phi node. I agree to keep things consistent. The example to trivial to be understood anyway. I updated the naming. > It seems that we require stride2 / stride to be a lossless division in the code. Not only lossless division (i.e., rounding-towards-zero) is used, the optimization requires this division to be exact with no remainders. Checks are in place to make sure optimization only happens if this condition is met: > `if ((ratio_con * stride_con) == stride_con2) { // Check for exact` I updated the comments to be more expressive regarding this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1598512900 From kxu at openjdk.org Mon May 13 13:54:07 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 13 May 2024 13:54:07 GMT Subject: RFR: 8327381: Refactor type-improving transformations in BoolNode::Ideal to BoolNode::Value [v8] In-Reply-To: <3zuVDnNd_9nUXHjG1TCWQjVVWuLcyCLAOEgJKeGnDL0=.996e0ab4-58d0-47c9-875b-26bcaae19887@github.com> References: <3zuVDnNd_9nUXHjG1TCWQjVVWuLcyCLAOEgJKeGnDL0=.996e0ab4-58d0-47c9-875b-26bcaae19887@github.com> Message-ID: On Fri, 5 Apr 2024 10:01:43 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/subnode.cpp line 1816: >> >>> 1814: // Change ((x & m) u<= m) or ((m & x) u<= m) to always true >>> 1815: // Same with ((x & m) u< m+1) and ((m & x) u< m+1) >>> 1816: if (cop == Op_CmpU && cmp1->Opcode() == Op_AndI) { >> >> You made this a bit more complicated than the original. Or was there a specific reason for the `is_Sub`? I'd do this: >> Suggestion: >> >> // Change ((x & m) u<= m) or ((m & x) u<= m) to always true >> // Same with ((x & m) u< m+1) and ((m & x) u< m+1) >> Node* cmp = in(1); >> if (cmp != nullptr && cmp->Opcode() == Op_CmpU) { >> Node* cmp1 = cmp->in(1); >> Node* cmp2 = cmp->in(2); >> if (cmp1->Opcode() == Op_AndI) { > > You could also move the whole code to its own method, and name it something like `BoolNode::Value_cmpu_and_mask`. Maybe you find an even more descriptive name. Cleaned up and moved to `::Value_cmpu_and_mask` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18198#discussion_r1598516418 From dchuyko at openjdk.org Mon May 13 13:55:10 2024 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Mon, 13 May 2024 13:55:10 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 13:03:26 GMT, Evgeny Astigeevich wrote: > Backout of [JDK-8309271](https://bugs.openjdk.org/browse/JDK-8309271) which has known bugs, possible bugs and performance issues. REDO work is tracked by [JDK-8331749](https://bugs.openjdk.org/browse/JDK-8331749). > > Found bugs: > - When refreshing `CompilerDirectivesAddDCmd::execute` will call `DirectivesStack::hasMatchingDirectives(mh, true)` which only considers the compiler directive which is on the top of the directives stack. As more than one directive can be added, `CompilerDirectivesAddDCmd::execute` will not behave as expected. > - A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. > > There are other concerns: bugs and performance issues. > > Possible bugs: > - `has_matching_directives` might not be cleared. A nmethod might get into the unloading state before `CodeCache::recompile_marked_directives_matches`. If the nmethod has been used to mark a Java method and it is the only nmethod, there will be no nmethod in CodeCache to reach the Java method to clear the mark. > - A Java method might have been compiled with new directives before `CodeCache::recompile_marked_directives_matches`. `CodeCache::recompile_marked_directives_matches` will recompile it again. > - JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. > > Performance issues: > - Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, `CodeCache::recompile_marked_directives_matches` will be traversing nmethods most of which don't need recompilation. > > The backout is not clean because of removal of `CompiledMethod`. > > Tested with release and fastdebug builds: tier1 and tier2 passed. Are there any high severity problems caused by the original PR? Especially not in the new functionality. Minor issues could be probably addressed without backing out the entire functionality. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2107638223 From eastigeevich at openjdk.org Mon May 13 14:24:18 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 13 May 2024 14:24:18 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 13:52:17 GMT, Dmitry Chuyko wrote: > Are there any high severity problems caused by the original PR? Especially not in the new functionality. Minor issues could be probably addressed without backing out the entire functionality. Yes, there are: > 1. Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, CodeCache::recompile_marked_directives_matches will be traversing nmethods most of which don't need recompilation. > 2. has_matching_directives might not be cleared. > 3. A Java method is not recompiled as requested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2107720199 From dchuyko at openjdk.org Mon May 13 14:37:03 2024 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Mon, 13 May 2024 14:37:03 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: <7kfgb5FXqda4SzqPO2XUXdx6CM_Z-G970nSpqvJVSYw=.b6b01073-66af-4c7d-8d7c-528a4f87707d@github.com> On Mon, 13 May 2024 14:21:35 GMT, Evgeny Astigeevich wrote: > > Are there any high severity problems caused by the original PR? Especially not in the new functionality. Minor issues could be probably addressed without backing out the entire functionality. > > > > Yes, there are: > > > > > 1. Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, CodeCache::recompile_marked_directives_matches will be traversing nmethods most of which don't need recompilation. > > > 2. has_matching_directives might not be cleared. > > > 3. A Java method is not recompiled as requested. > > So there are cases when new functionality doesn't work as expected (I don't see any other users impacted). Why not file bugs for those cases and estimate their impact? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2107777980 From stefank at openjdk.org Mon May 13 14:42:08 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 13 May 2024 14:42:08 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v10] In-Reply-To: References: Message-ID: <-RRlrDdRqiN1sxsQF7RYJIl8W6Z62LcAq8quEalrzjc=.f6ae63e5-92d9-41be-962b-e2741c676b32@github.com> On Fri, 10 May 2024 15:26:29 GMT, Andrew Haley wrote: >> At the present time, `assert_different_registers()` uses an O(N**2) algorithm in assert_different_registers(). We can utilize RegSet to do it in O(N) time. This would be a useful optimization for all builds with assertions enabled. >> >> In addition, it would be useful to be able to static_assert different registers. >> >> Also, I've taken the opportunity to expand the maximum size of a RegSet to 64 on 64-bit platforms. >> >> I also fixed a bug: sometimes `noreg` is passed to `assert_different_registers()`, but it may only be passed once or a spurious assertion is triggered. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Review feedback Approved. I've written some suggestions that I would prefer, but that are not strictly necessary before integration. src/hotspot/share/asm/register.hpp line 101: > 99: > 100: static constexpr int max_size() { > 101: return (int)(sizeof _bitset * CHAR_BIT); This makes me have to think about operator precedence and what CHAR_BIT is (not typically used in HotSpot). I'd prefer to see something like this: Suggestion: return (int)(sizeof(_bitset) * BitsPerByte); src/hotspot/share/asm/register.hpp line 263: > 261: template > 262: inline constexpr bool different_registers(AbstractRegSet allocated_regs, R first_register, Rx... more_registers) { > 263: if (allocated_regs.contains(first_register)) { FWIW, while first reading this I was looking for the base case of the recursion (the previous versions had some extra specializations). To me it looks like the base case is written in both this function and the function above. I would prefer to have the implementation inside one function only and change this function to use: if (!different_registers(allocated_regs, first_register)) { I think this could make it a bit clearer, but if you prefer the current style, I think that's fine as well. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16617#pullrequestreview-2052702736 PR Review Comment: https://git.openjdk.org/jdk/pull/16617#discussion_r1598475883 PR Review Comment: https://git.openjdk.org/jdk/pull/16617#discussion_r1598591701 From eastigeevich at openjdk.org Mon May 13 14:45:02 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 13 May 2024 14:45:02 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: <7kfgb5FXqda4SzqPO2XUXdx6CM_Z-G970nSpqvJVSYw=.b6b01073-66af-4c7d-8d7c-528a4f87707d@github.com> References: <7kfgb5FXqda4SzqPO2XUXdx6CM_Z-G970nSpqvJVSYw=.b6b01073-66af-4c7d-8d7c-528a4f87707d@github.com> Message-ID: On Mon, 13 May 2024 14:34:50 GMT, Dmitry Chuyko wrote: > So there are cases when new functionality doesn't work as expected (I don't see any other users impacted). Why not file bugs for those cases and estimate their impact? Do you know any users using the new functionality? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2107799744 From eastigeevich at openjdk.org Mon May 13 14:45:03 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 13 May 2024 14:45:03 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 13:03:26 GMT, Evgeny Astigeevich wrote: > Backout of [JDK-8309271](https://bugs.openjdk.org/browse/JDK-8309271) which has known bugs, possible bugs and performance issues. REDO work is tracked by [JDK-8331749](https://bugs.openjdk.org/browse/JDK-8331749). > > Found bugs: > - When refreshing `CompilerDirectivesAddDCmd::execute` will call `DirectivesStack::hasMatchingDirectives(mh, true)` which only considers the compiler directive which is on the top of the directives stack. As more than one directive can be added, `CompilerDirectivesAddDCmd::execute` will not behave as expected. > - A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. > > There are other concerns: bugs and performance issues. > > Possible bugs: > - `has_matching_directives` might not be cleared. A nmethod might get into the unloading state before `CodeCache::recompile_marked_directives_matches`. If the nmethod has been used to mark a Java method and it is the only nmethod, there will be no nmethod in CodeCache to reach the Java method to clear the mark. > - A Java method might have been compiled with new directives before `CodeCache::recompile_marked_directives_matches`. `CodeCache::recompile_marked_directives_matches` will recompile it again. > - JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. > > Performance issues: > - Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, `CodeCache::recompile_marked_directives_matches` will be traversing nmethods most of which don't need recompilation. > > The backout is not clean because of removal of `CompiledMethod`. > > Tested with release and fastdebug builds: tier1 and tier2 passed. IMO if nobody uses it and the amount of code is small, it is better to back out it and to reimplement it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2107809381 From dfenacci at openjdk.org Mon May 13 15:50:21 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 13 May 2024 15:50:21 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v10] In-Reply-To: References: Message-ID: > # Issue > When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. > > # Causes > On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. > The same is true for `StoreVector`s. > When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 > > where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor offsets interfere. > Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > but we don?t make sure that there are no masks or offsets. > A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and offsets here too but in this case we can include these cases if the masks and offsets of the vector stores are equivalent. > > # Solution > To avoid folding `Load`- and `StoreVector`s with masks and offsets we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns its own (to avoid ever being the same as a store node). In this way, the checks in `MemNode::can_see_stored_value` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 > > and `StoreNode::Identity` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > will fail if masks or offsets are used. > For 2 stores of the same value we instead check for mask and offset equality. > > Regression tests for all versions of `Load/StoreVectorGather/Masked` have been add... Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: - JDK-8325520: add extra tests - JDK-8325520: more tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18347/files - new: https://git.openjdk.org/jdk/pull/18347/files/9b742109..777bf562 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=08-09 Stats: 484 lines in 1 file changed: 483 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18347.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18347/head:pull/18347 PR: https://git.openjdk.org/jdk/pull/18347 From kvn at openjdk.org Mon May 13 16:32:15 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 13 May 2024 16:32:15 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 14:42:26 GMT, Evgeny Astigeevich wrote: >> Backout of [JDK-8309271](https://bugs.openjdk.org/browse/JDK-8309271) which has known bugs, possible bugs and performance issues. REDO work is tracked by [JDK-8331749](https://bugs.openjdk.org/browse/JDK-8331749). >> >> Found bugs: >> - When refreshing `CompilerDirectivesAddDCmd::execute` will call `DirectivesStack::hasMatchingDirectives(mh, true)` which only considers the compiler directive which is on the top of the directives stack. As more than one directive can be added, `CompilerDirectivesAddDCmd::execute` will not behave as expected. >> - A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. >> >> There are other concerns: bugs and performance issues. >> >> Possible bugs: >> - `has_matching_directives` might not be cleared. A nmethod might get into the unloading state before `CodeCache::recompile_marked_directives_matches`. If the nmethod has been used to mark a Java method and it is the only nmethod, there will be no nmethod in CodeCache to reach the Java method to clear the mark. >> - A Java method might have been compiled with new directives before `CodeCache::recompile_marked_directives_matches`. `CodeCache::recompile_marked_directives_matches` will recompile it again. >> - JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. >> >> Performance issues: >> - Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, `CodeCache::recompile_marked_directives_matches` will be traversing nmethods most of which don't need recompilation. >> >> The backout is not clean because of removal of `CompiledMethod`. >> >> Tested with release and fastdebug builds: tier1 and tier2 passed. > > IMO if nobody uses it and the amount of code is small, it is better to back out it and to reimplement it. @eastig do you have tests which shows issues you listed in description? I don't see any reference to them in this sub-task and in [REDO] bug [JDK-8331749](https://bugs.openjdk.org/browse/JDK-8331749). How you found these issues? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2108154151 From epeter at openjdk.org Mon May 13 17:10:04 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 May 2024 17:10:04 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop [v2] In-Reply-To: References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> <2lreoMy7UKtgM_m8RCU68rp3FFkoU8zj3ckuTKzXqf0=.dc02a0d4-2671-4c70-a470-a64f28e38f2d@github.com> Message-ID: On Mon, 13 May 2024 13:23:07 GMT, Roland Westrelin wrote: > > I guess the issue is that ConvL2I and ConvI2L are also type nodes, which can restrict their type, just like CastII nodes. And that restricting of the type is only true under a certain if-branch. > > That's not entirely true here. The `ConvL2I` captures the type of its input so not a narrower type. The problem is that the type is that of a `Phi` for a counted loop and once pushed through phi, the type captured by the `ConvI2L` becomes incorrect. So what exactly is it that guarantees the correctness of the `phi` range under the counted loop that is not true when you push it back? I mean I would assume the `phi` can only have values that its inputs actually produce, so its inputs cannot have wildly different ranges, right? At some point, this range must be established by some control flow, at which point we can do the "type restriction". I would now have to dive into the code and debug if the "type restriction" for counted loop phi happens purely because of the input values, or because of explicitly restrincting the type of the `ConvI2L`. But I do see that there is some `new ConvI2LNode(input, type)` cases where we do restrict the type of a `ConvI2L`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19086#issuecomment-2108260349 From galder at openjdk.org Mon May 13 17:40:54 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 13 May 2024 17:40:54 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v17] In-Reply-To: References: Message-ID: > Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. > > The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: > > > $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op > ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op > ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op > ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op > ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op > ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op > ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op > ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op > ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op > ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op > ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op > ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op > ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op > ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op > ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op > > > It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. > > I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. > > > $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > ... > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 > > > One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? > > Thanks @rwestrel for his help shaping this up :) Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: Add assert message ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17667/files - new: https://git.openjdk.org/jdk/pull/17667/files/c3b7fa47..09408587 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17667&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17667&range=15-16 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17667.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17667/head:pull/17667 PR: https://git.openjdk.org/jdk/pull/17667 From galder at openjdk.org Mon May 13 17:40:54 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 13 May 2024 17:40:54 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v16] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 13:44:42 GMT, Roland Westrelin wrote: >> Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/c1/c1_GraphBuilder.cpp >> >> Co-authored-by: Dean Long <17332032+dean-long at users.noreply.github.com> > > src/hotspot/share/c1/c1_GraphBuilder.cpp line 2031: > >> 2029: ciType* type = receiver->exact_type(); >> 2030: if (type != nullptr && type->is_loaded()) { >> 2031: assert(!type->is_instance_klass() || !type->as_instance_klass()->is_interface(), ""); > > Please add a message to the assert. Added, is that ok? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1598823687 From dlong at openjdk.org Mon May 13 19:46:03 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 13 May 2024 19:46:03 GMT Subject: RFR: 8331429: [JVMCI] Cleanup JVMCIRuntime allocation routines [v3] In-Reply-To: References: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> Message-ID: On Mon, 13 May 2024 11:34:18 GMT, Yudi Zheng wrote: >> This PR removes allocation routines that may throw exception from JVMCIRuntime. It also exports various symbols related to the hashed secondary supers table. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > remove trailing white space Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19176#pullrequestreview-2053628324 From eastigeevich at openjdk.org Mon May 13 20:37:40 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 13 May 2024 20:37:40 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 16:29:35 GMT, Vladimir Kozlov wrote: > do you have tests which shows issues you listed in description? Here is a jtreg test: - `refresh_control.02.txt` [ { match: "serviceability.dcmd.compiler.DirectivesRefreshTest::callable", c2: { PrintOptoAssembly: true } } ] - `DirectivesRefreshTest02.java` /** * @test DirectivesRefreshTest02 * @summary Test of forced recompile after compiler directives changes by diagnostic command * @requires vm.compiler1.enabled & vm.compiler2.enabled * @library /test/lib / * @modules java.base/jdk.internal.misc * * @build jdk.test.whitebox.WhiteBox * @run driver jdk.test.lib.helpers.ClassFileInstaller jdk.test.whitebox.WhiteBox * * @run main/othervm -Xbootclasspath/a:. -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI * -XX:+BackgroundCompilation -Xlog:codecache=trace -XX:-Inline -XX:+TieredCompilation -XX:CICompilerCount=2 * -XX:+UnlockDiagnosticVMOptions * serviceability.dcmd.compiler.DirectivesRefreshTest02 */ package serviceability.dcmd.compiler; import jdk.test.whitebox.WhiteBox; import jdk.test.lib.process.OutputAnalyzer; import jdk.test.lib.dcmd.CommandExecutor; import jdk.test.lib.dcmd.JMXExecutor; import java.nio.file.Path; import java.nio.file.Paths; import java.lang.reflect.Method; import java.util.Random; import static jdk.test.lib.Asserts.assertEQ; import static compiler.whitebox.CompilerWhiteBoxTest.COMP_LEVEL_NONE; import static compiler.whitebox.CompilerWhiteBoxTest.COMP_LEVEL_SIMPLE; import static compiler.whitebox.CompilerWhiteBoxTest.COMP_LEVEL_FULL_OPTIMIZATION; public class DirectivesRefreshTest02 { static Path cmdPath = Paths.get(System.getProperty("test.src", "."), "refresh_control.02.txt"); static WhiteBox wb = WhiteBox.getWhiteBox(); static Random random = new Random(); static Method method; static CommandExecutor executor; static int callable() { int result = 0; for (int i = 0; i < 100; i++) { result += random.nextInt(100); } return result; } static void setup() throws Exception { method = DirectivesRefreshTest.class.getDeclaredMethod("callable"); executor = new JMXExecutor(); wb.enqueueMethodForCompilation(method, COMP_LEVEL_SIMPLE); while (wb.isMethodQueuedForCompilation(method)) { Thread.onSpinWait(); } wb.lockCompilation(); boolean r = wb.enqueueMethodForCompilation(method, COMP_LEVEL_FULL_OPTIMIZATION); System.out.println("Method enqueued: " + r); } static void testDirectivesAddRefresh() { var output = executor.execute("Compiler.directives_add -r " + cmdPath.toString()); output.stderrShouldBeEmpty().shouldContain("1 compiler directives added"); System.out.println("Method enqueued: " + wb.isMethodQueuedForCompilation(method)); wb.unlockCompilation(); wb.enqueueMethodForCompilation(method, COMP_LEVEL_FULL_OPTIMIZATION); while (wb.isMethodQueuedForCompilation(method)) { Thread.onSpinWait(); } System.out.println("Method compilation level: " + wb.getMethodCompilationLevel(method)); assertEQ(true, false, "Stop here"); } public static void main(String[] args) throws Exception { setup(); testDirectivesAddRefresh(); } } ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2108744800 From dnsimon at openjdk.org Mon May 13 20:37:43 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 13 May 2024 20:37:43 GMT Subject: RFR: 8331429: [JVMCI] Cleanup JVMCIRuntime allocation routines [v3] In-Reply-To: References: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> Message-ID: On Mon, 13 May 2024 11:34:18 GMT, Yudi Zheng wrote: >> This PR removes allocation routines that may throw exception from JVMCIRuntime. It also exports various symbols related to the hashed secondary supers table. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > remove trailing white space Marked as reviewed by dnsimon (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19176#pullrequestreview-2053738498 From dfenacci at openjdk.org Mon May 13 20:38:56 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 13 May 2024 20:38:56 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v11] In-Reply-To: References: Message-ID: > # Issue > When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. > > # Causes > On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. > The same is true for `StoreVector`s. > When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 > > where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor offsets interfere. > Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > but we don?t make sure that there are no masks or offsets. > A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and offsets here too but in this case we can include these cases if the masks and offsets of the vector stores are equivalent. > > # Solution > To avoid folding `Load`- and `StoreVector`s with masks and offsets we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns its own (to avoid ever being the same as a store node). In this way, the checks in `MemNode::can_see_stored_value` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 > > and `StoreNode::Identity` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > will fail if masks or offsets are used. > For 2 stores of the same value we instead check for mask and offset equality. > > Regression tests for all versions of `Load/StoreVectorGather/Masked` have been add... Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8325520: update match condition ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18347/files - new: https://git.openjdk.org/jdk/pull/18347/files/777bf562..e676bcb1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=09-10 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18347.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18347/head:pull/18347 PR: https://git.openjdk.org/jdk/pull/18347 From eastigeevich at openjdk.org Mon May 13 20:40:49 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 13 May 2024 20:40:49 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 13:03:26 GMT, Evgeny Astigeevich wrote: > Backout of [JDK-8309271](https://bugs.openjdk.org/browse/JDK-8309271) which has known bugs, possible bugs and performance issues. REDO work is tracked by [JDK-8331749](https://bugs.openjdk.org/browse/JDK-8331749). > > Found bugs: > - When refreshing `CompilerDirectivesAddDCmd::execute` will call `DirectivesStack::hasMatchingDirectives(mh, true)` which only considers the compiler directive which is on the top of the directives stack. As more than one directive can be added, `CompilerDirectivesAddDCmd::execute` will not behave as expected. > - A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. > > There are other concerns: bugs and performance issues. > > Possible bugs: > - `has_matching_directives` might not be cleared. A nmethod might get into the unloading state before `CodeCache::recompile_marked_directives_matches`. If the nmethod has been used to mark a Java method and it is the only nmethod, there will be no nmethod in CodeCache to reach the Java method to clear the mark. > - A Java method might have been compiled with new directives before `CodeCache::recompile_marked_directives_matches`. `CodeCache::recompile_marked_directives_matches` will recompile it again. > - JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. > > Performance issues: > - Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, `CodeCache::recompile_marked_directives_matches` will be traversing nmethods most of which don't need recompilation. > > The backout is not clean because of removal of `CompiledMethod`. > > Tested with release and fastdebug builds: tier1 and tier2 passed. There is no `PrintOptoAssembly` in output. I use `lockCompilation()`/`unlockCompilation()` to simulate: > A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. I think using them we can also simulate, though it would not be easy to write a test: > JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2108759073 From eastigeevich at openjdk.org Mon May 13 20:50:01 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 13 May 2024 20:50:01 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: <43tyZlzDKG1-M3YMBjjSKx2R3OosZuyfQySaBuV_KTc=.45597f64-6ff7-4d83-8416-aa29154d92df@github.com> On Mon, 13 May 2024 16:29:35 GMT, Vladimir Kozlov wrote: > How you found these issues? I've been backporting JDK-8309271 to downstream 17 and 21. As compilations happens in background but a test from JDK-8309271 runs with background compilation off, I asked myself what might happen with background compilation. I have a patch fixing the test above. I don't think it is a complete fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2108770472 From dfenacci at openjdk.org Mon May 13 21:03:13 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 13 May 2024 21:03:13 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v6] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 15:29:11 GMT, Damon Fenacci wrote: >> src/hotspot/share/opto/memnode.cpp line 3554: >> >>> 3552: } >>> 3553: } >>> 3554: } >> >> I think the code is now correct. >> But I find the nested if-elseif-elseif-else ... structure a bit hard to read. And there is quite some code duplication (e.g. `result = mem` and all the `eqv_uncast` checks). >> >> You could either do something like this: >> >> if (!is_StoreVector() || >> as_StoreVector()->has_same_vect_type_and_offsets_and_mask(mem->as_StoreVector())) { >> result = mem; >> } >> >> >> Sketch: >> >> has_same_vect_type_and_offsets_and_mask: >> >> different vect_type -> return false >> ... >> >> >> Or maybe it would be better to define virtual functions to get the `mask` and `offsets` from a `StoreVector`? If it has none, just return `nullptr`. Sometimes people worry about virtual methods, but we already use them extensively for the node Value/Ideal anyway. >> >> Then, you can do: >> >> if (!is_StoreVector()) { >> result = mem; >> } else { >> const Node* offsets1 = as_StoreVector()->get_offsets(); >> const Node* offsets2 = mem->as_StoreVector()->get_offsets(); >> const Node* mask1 = as_StoreVector()->get_mask(); >> const Node* mask2 = mem->as_StoreVector()->get_mask(); >> if (offsets1->eqv_uncast(offsets2) && offsets1->eqv_uncast(offsets2)) { >> result = mem; >> } >> } >> >> I think that would be the cleanest and most readable way. >> >> What do you think? > > I agree that it is quite convoluted probably also because I've put `if (!is_StoreVector())` (which is redundant) at the beginning to get the most common case out of the way but still... > At first I thought that multiple inheritance would be a good solution (masks and offsets could be inherited by the corresponding nodes) but the "HotSpot Coding Style" clearly says to avoid it... > So, I think in the end your second suggestion is the cleanest. Changing it... I've updated it. The condition unfortunately doesn't look as clean as the one above as we need to check for `nullptr` (either both or none and `eqv_uncast`). I've tried to make it as concise as possible (we could have made `mask` and `offsets` return a _unique_ node instead, so as to avoid the `nullptr`, but I had the impression it would just make everything less clear). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1599077017 From dfenacci at openjdk.org Mon May 13 21:09:12 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 13 May 2024 21:09:12 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v4] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 12:56:55 GMT, Emanuel Peter wrote: > * No mixed type test for load-store: Use MemorySegment `from/intoMmemorySegment`. Try something like store a int-vector, and load a float-vector. It looks as if load/stores that use `from`/`intoMemorySegment` with different types apparently don?t create `LoadVector` nodes. It seems that `fromMemorySegment` tries to inline the `VectorSupport::load` intrinsic, but fails as the type of the vector and the inferred type of the underlying memory segment differ: https://github.com/openjdk/jdk/blob/9b742109b196d79cbf712ffd3f64edd1d6497114/src/hotspot/share/opto/vectorIntrinsics.cpp#L1055-L1064 > * Mismatched vector length: store a vector of length 4, and load one of length 8. I've added tests tests that store and load with different species (`SPECIES_64`). > * Do some store-store and store-load cases where you the first and second are different loads/stores, i.e. one with and one without mask/offsets. E.g. `StoreVectorMasked` and `StoreVectorScatter` in a store-store test. Doing the total cross-product is probably too much, but a few examples would be a good start. You're right, there were just very few of them. Added many more. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18347#issuecomment-2108799572 From eastigeevich at openjdk.org Mon May 13 21:11:02 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 13 May 2024 21:11:02 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 13:03:26 GMT, Evgeny Astigeevich wrote: > Backout of [JDK-8309271](https://bugs.openjdk.org/browse/JDK-8309271) which has known bugs, possible bugs and performance issues. REDO work is tracked by [JDK-8331749](https://bugs.openjdk.org/browse/JDK-8331749). > > Found bugs: > - When refreshing `CompilerDirectivesAddDCmd::execute` will call `DirectivesStack::hasMatchingDirectives(mh, true)` which only considers the compiler directive which is on the top of the directives stack. As more than one directive can be added, `CompilerDirectivesAddDCmd::execute` will not behave as expected. > - A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. > > There are other concerns: bugs and performance issues. > > Possible bugs: > - `has_matching_directives` might not be cleared. A nmethod might get into the unloading state before `CodeCache::recompile_marked_directives_matches`. If the nmethod has been used to mark a Java method and it is the only nmethod, there will be no nmethod in CodeCache to reach the Java method to clear the mark. > - A Java method might have been compiled with new directives before `CodeCache::recompile_marked_directives_matches`. `CodeCache::recompile_marked_directives_matches` will recompile it again. > - JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. > > Performance issues: > - Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, `CodeCache::recompile_marked_directives_matches` will be traversing nmethods most of which don't need recompilation. > > The backout is not clean because of removal of `CompiledMethod`. > > Tested with release and fastdebug builds: tier1 and tier2 passed. What if instead of backing out we will use an experimental JVM flag: `XX:+CompilerDirectivesRefreshSupport`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2108802569 From cslucas at openjdk.org Mon May 13 22:09:23 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 13 May 2024 22:09:23 GMT Subject: RFR: JDK-8330795 : C2: assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type with -XX:-UseCompressedClassPointers [v3] In-Reply-To: References: Message-ID: <3KQPqbFAyVDkPx28d8DN8Y1_zrJ6LwX6eOEOqxe8mvs=.4ec47e90-e516-4960-96c7-8f0cdbc8b29b@github.com> > The `assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type` failure was caused by the fact that we didn't have a "zero value" for the type T_METADATA. The RAM patch uses that data when it creates a Phi node merging Klass loads and UseCompressedClassPointers is disabled. > > Tested with JTREG tier1-4 on Linux x86_64 & ARM64. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Addressing feedback: more tests. Reverting previous change. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19148/files - new: https://git.openjdk.org/jdk/pull/19148/files/91fc61de..bb632c27 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19148&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19148&range=01-02 Stats: 79 lines in 4 files changed: 54 ins; 3 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/19148.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19148/head:pull/19148 PR: https://git.openjdk.org/jdk/pull/19148 From cslucas at openjdk.org Mon May 13 22:11:05 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 13 May 2024 22:11:05 GMT Subject: RFR: JDK-8330795 : C2: assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type with -XX:-UseCompressedClassPointers In-Reply-To: References: Message-ID: <_-fPO2t3GrRZjX1m0Z8kaH9k6rwSAKm0vZ0tWPFgoVc=.5c2489d5-6d48-4913-b3ac-bd1544dfdf07@github.com> On Thu, 9 May 2024 01:46:45 GMT, Vladimir Kozlov wrote: >> @JohnTortugo, thank you for adding new test. But it would be nice also add additional run with `-XX:+IgnoreUnrecognizedVMOptions -XX:-UseCompressedClassPointers` to failed test `test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java` >> >> Also why you require to run test only with compressed oops on?: >> >> * @requires vm.debug == true & vm.bits == 64 & vm.compiler2.enabled & vm.opt.final.UseCompressedOops & vm.opt.final.EliminateAllocations > >> @JohnTortugo, thank you for adding new test. But it would be nice also add additional run with `-XX:+IgnoreUnrecognizedVMOptions -XX:-UseCompressedClassPointers` to failed test `test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java` > > Actually `-XX:+IgnoreUnrecognizedVMOptions` is not needed because you require `vm.bits == 64` in the test. @vnkozlov - I updated the patch by adding new tests with CompressedOops/CompressedClassPointers enabled and disabled. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19148#issuecomment-2108886624 From kvn at openjdk.org Mon May 13 22:46:02 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 13 May 2024 22:46:02 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: <43tyZlzDKG1-M3YMBjjSKx2R3OosZuyfQySaBuV_KTc=.45597f64-6ff7-4d83-8416-aa29154d92df@github.com> References: <43tyZlzDKG1-M3YMBjjSKx2R3OosZuyfQySaBuV_KTc=.45597f64-6ff7-4d83-8416-aa29154d92df@github.com> Message-ID: On Mon, 13 May 2024 20:46:06 GMT, Evgeny Astigeevich wrote: > There is a race among a thread updating directives, compiler threads and CodeCache cleaning threads. We don't properly lock the directives stack, the compile queue and CodeCache to manage the race. This is indeed concerning. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2108925371 From kvn at openjdk.org Mon May 13 22:46:03 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 13 May 2024 22:46:03 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 21:08:08 GMT, Evgeny Astigeevich wrote: > What if instead of backing out we will use an experimental JVM flag: `XX:+CompilerDirectivesRefreshSupport`? I don't think this is correct way to fix the bug. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2108926307 From kvn at openjdk.org Mon May 13 22:52:05 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 13 May 2024 22:52:05 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 13:03:26 GMT, Evgeny Astigeevich wrote: > Backout of [JDK-8309271](https://bugs.openjdk.org/browse/JDK-8309271) which has known bugs, possible bugs and performance issues. REDO work is tracked by [JDK-8331749](https://bugs.openjdk.org/browse/JDK-8331749). > > Found bugs: > - When refreshing `CompilerDirectivesAddDCmd::execute` will call `DirectivesStack::hasMatchingDirectives(mh, true)` which only considers the compiler directive which is on the top of the directives stack. As more than one directive can be added, `CompilerDirectivesAddDCmd::execute` will not behave as expected. > - A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. > > There are other concerns: bugs and performance issues. > > Possible bugs: > - `has_matching_directives` might not be cleared. A nmethod might get into the unloading state before `CodeCache::recompile_marked_directives_matches`. If the nmethod has been used to mark a Java method and it is the only nmethod, there will be no nmethod in CodeCache to reach the Java method to clear the mark. > - A Java method might have been compiled with new directives before `CodeCache::recompile_marked_directives_matches`. `CodeCache::recompile_marked_directives_matches` will recompile it again. > - JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. > > Performance issues: > - Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, `CodeCache::recompile_marked_directives_matches` will be traversing nmethods most of which don't need recompilation. > > The backout is not clean because of removal of `CompiledMethod`. > > Tested with release and fastdebug builds: tier1 and tier2 passed. I agree with this backout. Thank you @eastig for explaining your point. We have about 3 weeks before RDP1 and it is better we have less issues before that. Let redo implementation in next release taking into account the issues you found and have more time for testing. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19215#pullrequestreview-2053940066 From sviswanathan at openjdk.org Mon May 13 23:15:07 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 13 May 2024 23:15:07 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Rearrange; add lambdas for clarity src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1054: > 1052: } else if (isUL) { > 1053: __ movzbl(rTmp, Address(needle, 2)); > 1054: __ movdl(byte_1, rTmp); Should be: __ movdl(byte_2, rTmp); src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1056: > 1054: __ movdl(byte_1, rTmp); > 1055: // 1st byte of needle in words > 1056: __ vpbroadcastw(byte_1, byte_1, Assembler::AVX_256bit); Should be: __ vpbroadcastw(byte_2, byte_2, Assembler::AVX_256bit); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599194092 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599194375 From kvn at openjdk.org Mon May 13 23:23:01 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 13 May 2024 23:23:01 GMT Subject: RFR: JDK-8330795 : C2: assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type with -XX:-UseCompressedClassPointers In-Reply-To: <_-fPO2t3GrRZjX1m0Z8kaH9k6rwSAKm0vZ0tWPFgoVc=.5c2489d5-6d48-4913-b3ac-bd1544dfdf07@github.com> References: <_-fPO2t3GrRZjX1m0Z8kaH9k6rwSAKm0vZ0tWPFgoVc=.5c2489d5-6d48-4913-b3ac-bd1544dfdf07@github.com> Message-ID: On Mon, 13 May 2024 22:08:44 GMT, Cesar Soares Lucas wrote: >>> @JohnTortugo, thank you for adding new test. But it would be nice also add additional run with `-XX:+IgnoreUnrecognizedVMOptions -XX:-UseCompressedClassPointers` to failed test `test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java` >> >> Actually `-XX:+IgnoreUnrecognizedVMOptions` is not needed because you require `vm.bits == 64` in the test. > > @vnkozlov - I updated the patch by adding new tests with CompressedOops/CompressedClassPointers enabled and disabled. @JohnTortugo This looks reasonable. Can you explain more why having klass field load is bad for your code? Is it because you need klass as constant for deoptimization? Is it possible to handle such case (loading klass) as separate RFE later? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19148#issuecomment-2108978102 From duke at openjdk.org Mon May 13 23:54:09 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Mon, 13 May 2024 23:54:09 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Rearrange; add lambdas for clarity src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4492: > 4490: > 4491: // Compare char[] or byte[] arrays aligned to 4 bytes or substrings. > 4492: void C2_MacroAssembler::arrays_equals(bool is_array_equ, Register ary1, I liked the old style better, fewer longer lines.. same for rest of the changes in this file. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4594: > 4592: #endif //_LP64 > 4593: bind(COMPARE_WIDE_VECTORS); > 4594: vmovdqu(vec1, Address(ary1, limit, create a local scale variable instead of ternary operators. Used several times. src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4250: > 4248: generate_chacha_stubs(); > 4249: > 4250: if ((UseAVX == 2) && EnableX86ECoreOpts && VM_Version::supports_avx2()) { Just `if (EnableX86ECoreOpts)`? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 391: > 389: } > 390: > 391: __ cmpq(needle_len, isU ? 2 : 1); Can we remove this comparison? i.e. - broadcast first and last character unconditionally (same character). Or - move broadcasts 'down' into individual cases.. There is already specialized code to handle needle of size 1.. This adds extra pathlength. (Will we actually call this intrinsic for needle_size==1? Assume length>=2?) src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1365: > 1363: // Compare first byte of needle to haystack > 1364: vpcmpeq(cmp_0, byte_0, Address(haystack, 0), Assembler::AVX_256bit); > 1365: if (size != (isU ? 2 : 1)) { `if (size != scale)` Though in this case, `elem_size` might hold more meaning. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1372: > 1370: > 1371: if (bytesToCompare > 2) { > 1372: if (size > (isU ? 4 : 2)) { `if (size > 2*scale)`? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1373: > 1371: if (bytesToCompare > 2) { > 1372: if (size > (isU ? 4 : 2)) { > 1373: if (doEarlyBailout) { Is there a big perf difference when `doEarlyBailout` is enabled? And/or just for this function? (i.e. removing `doEarlyBailout` in this function will mean less pathlength. Feels like a few extra vpands should be cheap enough.) src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1469: > 1467: > 1468: if (isU && (size & 1)) { > 1469: __ emit_int8(0xcc); This should also be an `assert()` to catch this at compile-time. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1633: > 1631: if (isU) { > 1632: if ((size & 1) != 0) { > 1633: __ emit_int8(0xcc); Compile-time assert to ensure this code is never called instead? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1889: > 1887: // r13 = (needle length - 1) > 1888: // r14 = &needle > 1889: // r15 = unused There is quite a bit of redundancy in register usage. Its not incorrect, but looks odd. Not clear if this duplication can easily be removed (or if/why needed). // rbx = &haystack // rdi = &haystack // rdx = &needle // r14 = &needle // rcx = haystack length // rsi = haystack length // r12 = needle length // r13 = (needle length - 1) // r10 = hs_len - needle len // rbp = -1 // rax = unused // r11 = unused // r8 = unused // r9 = unused // r15 = unused (Could this comment be out-of-sync with the code? Looks like only rbx, r14 and temps out of unused registers are used few lines down) src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1950: > 1948: // r13 = (needle length - 1) > 1949: // r14 = &needle > 1950: // r15 = unused Same as for the small case ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592834449 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592838385 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592831339 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599131482 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599146451 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599144855 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599143784 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599151000 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599204083 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599209564 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599213635 From sviswanathan at openjdk.org Tue May 14 00:51:08 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 14 May 2024 00:51:08 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Rearrange; add lambdas for clarity src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1083: > 1081: // haystack - the address of the first byte of the haystack > 1082: // hsLen - the sizeof the haystack > 1083: // isU - true if argument encoding is either UU or UL We need to list needleLen here as well? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1096: > 1094: MacroAssembler *_masm) { > 1095: > 1096: assert_different_registers(eq_mask, haystack, needleLen, rTmp, hsLen, r10); r10 kind of stands out here. You could say nMinusK in this assert. The assert following to this one is checking for nMinusK==r10 so that should suffice. BTW, didn't see anything in the code below that needs nMinuxK to be r10. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1120: > 1118: #define cmp_0 XMM_TMP3 > 1119: #undef cmp_k > 1120: #define cmp_k XMM_TMP4 XMM_TMP4 is not reused so cmp_k could be declared as const. In general limiting undef/define pair only to reused registers would make the review easier. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1125: > 1123: #undef lastMask > 1124: > 1125: int sizeIncr = isU ? 2 : 1; sizeIncr and scale seems to be same, we could just use one of them in this function. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1178: > 1176: __ andq(eq_mask, lastMask); > 1177: if (needToSaveRCX) { > 1178: __ movdq(rcx, saveRCX); movdq is an expensive instruction (about 3 cycle). If we have another gpr temporary available here for shiftVal, then we dont need to do save/restore rcx. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1183: > 1181: > 1182: if (bytesToCompare > 2) { > 1183: if (size > (isU ? 4 : 2)) { this and other usages could be simplified to: size > 2 * scale ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599201163 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599203881 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599211645 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599202848 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599242323 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599228299 From cslucas at openjdk.org Tue May 14 02:51:01 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 14 May 2024 02:51:01 GMT Subject: RFR: JDK-8330795 : C2: assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type with -XX:-UseCompressedClassPointers In-Reply-To: References: <_-fPO2t3GrRZjX1m0Z8kaH9k6rwSAKm0vZ0tWPFgoVc=.5c2489d5-6d48-4913-b3ac-bd1544dfdf07@github.com> Message-ID: On Mon, 13 May 2024 23:20:12 GMT, Vladimir Kozlov wrote: > Can you explain more why having klass field load is bad for your code? The issue involves LoadNKlass, DecodeNKlass and NULL NKlass. It happens when splitting a LoadNKlass through a nullable Phi. In that process another "nullable" Phi of type TypeNarrowKlass may be created merging the "Klass'es" of the original Phi inputs. A NULL NarrowKlass seems to be something not quite well defined: for instance, there is no definition of "_zero_type" for T_METADATA which is the basic type of TypeNarrowKlass. The first commit in this PR was to add this definition. However, I think a better approach - than the one from first commit - maybe to instead of creating a Phi of type NarrowKlass create a Phi of type TypePtr that merges DecodeNKlass. By doing so I won't need to create a Phi with a NULL **NKlass** so the original patch isn't necessary. However, in my opinion, doing that is better left for a separate RFE + PR. > Is it possible to handle such case (loading klass) as separate RFE later? Yes, I think we can do it as a separate RFE. However, in my experiments klass field loading doesn't appear very often in the benchmarks and therefore may not be much worth the added complication. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19148#issuecomment-2109175690 From kvn at openjdk.org Tue May 14 03:54:01 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 14 May 2024 03:54:01 GMT Subject: RFR: JDK-8330795 : C2: assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type with -XX:-UseCompressedClassPointers In-Reply-To: References: <_-fPO2t3GrRZjX1m0Z8kaH9k6rwSAKm0vZ0tWPFgoVc=.5c2489d5-6d48-4913-b3ac-bd1544dfdf07@github.com> Message-ID: On Tue, 14 May 2024 02:48:44 GMT, Cesar Soares Lucas wrote: > However, in my experiments klass field loading doesn't appear very often in the benchmarks and therefore may not be much worth the added complication. It may be true for code with new allocations but in general case when an object is passed as argument or loaded from field klass loading is common case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19148#issuecomment-2109226957 From kvn at openjdk.org Tue May 14 03:54:02 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 14 May 2024 03:54:02 GMT Subject: RFR: JDK-8330795 : C2: assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type with -XX:-UseCompressedClassPointers [v3] In-Reply-To: <3KQPqbFAyVDkPx28d8DN8Y1_zrJ6LwX6eOEOqxe8mvs=.4ec47e90-e516-4960-96c7-8f0cdbc8b29b@github.com> References: <3KQPqbFAyVDkPx28d8DN8Y1_zrJ6LwX6eOEOqxe8mvs=.4ec47e90-e516-4960-96c7-8f0cdbc8b29b@github.com> Message-ID: On Mon, 13 May 2024 22:09:23 GMT, Cesar Soares Lucas wrote: >> The `assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type` failure was caused by the fact that we didn't have a "zero value" for the type T_METADATA. The RAM patch uses that data when it creates a Phi node merging Klass loads and UseCompressedClassPointers is disabled. >> >> Tested with JTREG tier1-4 on Linux x86_64 & ARM64. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Addressing feedback: more tests. Reverting previous change. Thank you for explaining issue you have with klass loading. I will run our testing with you current version. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19148#issuecomment-2109228049 From fyang at openjdk.org Tue May 14 04:31:01 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 14 May 2024 04:31:01 GMT Subject: RFR: 8332130: RISC-V: remove wrong instructions of Vector Crypto Extension In-Reply-To: References: Message-ID: On Mon, 13 May 2024 08:14:43 GMT, Hamlin Li wrote: > Hi, > Can you help to reivew this simple patch to remove some wrong instrunctions on riscv? > These instrunctions are wrong in that e.g. take `vror.vx` as example, > * by definition of spec, it should be `vror.vx vd, vs2, *rs1*, vm` > * the implementation here, it is indeed `vror_vx(VectorRegister Vd, VectorRegister Vs2, *VectorRegister* Vs1, VectorMask vm = unmasked)` > > Thanks I think you mean the `funct3` (`OPIVV` vs `OPIVX`) encoding is wrong? ------------- PR Review: https://git.openjdk.org/jdk/pull/19211#pullrequestreview-2054252934 From mli at openjdk.org Tue May 14 06:15:01 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 14 May 2024 06:15:01 GMT Subject: RFR: 8332130: RISC-V: remove wrong instructions of Vector Crypto Extension In-Reply-To: References: Message-ID: On Tue, 14 May 2024 04:28:40 GMT, Fei Yang wrote: > I think you mean the `funct3` (`OPIVV` vs `OPIVX`) encoding is wrong? Yes ------------- PR Comment: https://git.openjdk.org/jdk/pull/19211#issuecomment-2109364170 From roland at openjdk.org Tue May 14 07:35:01 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 14 May 2024 07:35:01 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop [v2] In-Reply-To: <_kbcMydcMPblcm_FDDuL5vWGT7q6iRoarmYsTlEA0hQ=.290c6744-211d-406d-8ed1-90e510051167@github.com> References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> <_kbcMydcMPblcm_FDDuL5vWGT7q6iRoarmYsTlEA0hQ=.290c6744-211d-406d-8ed1-90e510051167@github.com> Message-ID: On Mon, 13 May 2024 13:23:46 GMT, Roland Westrelin wrote: >> In the test case: >> >> >> long i; >> for (; i > 0; i--) { >> res += 42 / ((int) i); >> >> >> The long counted loop phi has type `[1..100]`. As a consequence, the >> `ConvL2I` also has type `[1..100]`. The `DivI` node that follows can't >> fault: it is not guarded by a zero check and has no control set. >> >> The `ConvL2I` is split through phi and so is the `DiVI` node: >> `PhaseIdealLoop::cannot_split_division()` returns true because the >> value coming from the backedge into the `DivI` (when it is about to be >> split thru phi) is the result of the `ConvL2I` which has type >> `[1..100`] so is not zero as far as the compiler can tell. >> >> On the last iteration of the loop, i is 1. Because the DivI was split >> thru Phi, it computes the value for the following iteration, so for i >> = 0. This causes a crash when the compiled code runs. >> >> The same problem can't happen with an int counted loop because logic >> in `PhaseIdealLoop::split_thru_phi()` prevents a `ConvI2L` from being >> split thru phi. I propose to fix this the same way: in the test case, >> it's not true that once the `ConvL2I` is split thru phi it keeps type >> `[1..100]`. The fix is fairly conservative because it's base on the >> existing logic for `ConvI2L`: we would want to not split a `ConvL2I` >> only a counted loopd but. I suppose the same is true for the `ConvI2L` >> and I thought it would be best to revisit both together. > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - test case tweaks > - fuzzer test Before split if: long i = 100; for (; i > 0;) { // i here is 1..100 int j = (int)i; // ConvL2I type is 1..100, same as loop phi int k = 42 / j; i--; } after split if: long i = 100; int j = 100; int k = 0; for (; i > 0;) { // i here is 1..100 i--; // i here is 0..99 j = (int)i; // ConvL2I type is still 1..100 which is not correct k = 42 / j; } ------------- PR Comment: https://git.openjdk.org/jdk/pull/19086#issuecomment-2109483191 From fyang at openjdk.org Tue May 14 07:41:03 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 14 May 2024 07:41:03 GMT Subject: RFR: 8332130: RISC-V: remove wrong instructions of Vector Crypto Extension In-Reply-To: References: Message-ID: On Tue, 14 May 2024 06:11:57 GMT, Hamlin Li wrote: > > I think you mean the `funct3` (`OPIVV` vs `OPIVX`) encoding is wrong? > > Yes >From the RVV spec [1], the `funct3` encoding for `OPIVX` is 0b100, which is also reflected on the instruction encoding. So why would you think it's wrong? Anything I missed? [1] https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-arithmetic-instruction-formats ------------- PR Comment: https://git.openjdk.org/jdk/pull/19211#issuecomment-2109491672 From epeter at openjdk.org Tue May 14 08:02:02 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 May 2024 08:02:02 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop [v2] In-Reply-To: References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> <_kbcMydcMPblcm_FDDuL5vWGT7q6iRoarmYsTlEA0hQ=.290c6744-211d-406d-8ed1-90e510051167@github.com> Message-ID: On Tue, 14 May 2024 07:32:26 GMT, Roland Westrelin wrote: >> Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: >> >> - test case tweaks >> - fuzzer test > > Before split if: > > long i = 100; > for (; i > 0;) { > // i here is 1..100 > int j = (int)i; // ConvL2I type is 1..100, same as loop phi > int k = 42 / j; > i--; > } > > > after split if: > > > long i = 100; > int j = 100; > int k = 0; > for (; i > 0;) { > // i here is 1..100 > i--; > // i here is 0..99 > j = (int)i; // ConvL2I type is still 1..100 which is not correct > k = 42 / j; > } @rwestrel which "split_if" optimization was applied in your example? Split the ConvI2L through the phi? If so, the problem seems to be that the ConvI2L floats by the exit-check, right? after split if: long i = 100; int j = 100; int k = 0; for (; i > 0;) { // i here is 1..100 i--; // i here is 0..99 exit check // i here is 1..99 j = (int)i; // ConvL2I type is still 1..100 which is not correct k = 42 / j; } I guess the issue is that the `ConvL2I` was somehow pinned inside the loop, after the `CountedLoop`, by the `phi`. But when the `ConvL2I` is split into the backedge, it does not stay in the backedge but floats further, passes by the exit-check and goes into the last iteration -> BOOM. How exactly did we narrow the type to `1...100`? I guess that that is some smart logic in the trip count `Phi` node, right? If instead we had a `CastLL` for the exit check that narrows the type, then the `CastLL` would remain after the split-if, and the split `ConvL2I` could not float from the backedge into the loop body of the last iteration. So I guess that is really a limitation: a trip count `Phi` specifically does the narrowing, and so you cannot just split past it. The question is if that is really nice, or if we could do it differently, e.g. via a `CastLL/CastII` on the exit-check? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19086#issuecomment-2109526424 From roland at openjdk.org Tue May 14 08:11:02 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 14 May 2024 08:11:02 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop [v2] In-Reply-To: References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> <_kbcMydcMPblcm_FDDuL5vWGT7q6iRoarmYsTlEA0hQ=.290c6744-211d-406d-8ed1-90e510051167@github.com> Message-ID: On Tue, 14 May 2024 07:32:26 GMT, Roland Westrelin wrote: >> Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: >> >> - test case tweaks >> - fuzzer test > > Before split if: > > long i = 100; > for (; i > 0;) { > // i here is 1..100 > int j = (int)i; // ConvL2I type is 1..100, same as loop phi > int k = 42 / j; > i--; > } > > > after split if: > > > long i = 100; > int j = 100; > int k = 0; > for (; i > 0;) { > // i here is 1..100 > i--; > // i here is 0..99 > j = (int)i; // ConvL2I type is still 1..100 which is not correct > k = 42 / j; > } > @rwestrel which "split_if" optimization was applied in your example? Split the ConvI2L through the phi? If so, the problem seems to be that the ConvI2L floats by the exit-check, right? Yes. > So I guess that is really a limitation: a trip count `Phi` specifically does the narrowing, and so you cannot just split past it. The question is if that is really nice, or if we could do it differently, e.g. via a `CastLL/CastII` on the exit-check? The issue involves conv nodes when split thru phi at a counted loop. That's a narrow corner case. I think fixing it by addressing the corner case where it occurs as proposed is simpler than trying a most general fix which can have hard to anticipate consequences. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19086#issuecomment-2109544930 From redestad at openjdk.org Tue May 14 08:26:05 2024 From: redestad at openjdk.org (Claes Redestad) Date: Tue, 14 May 2024 08:26:05 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v8] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 18:24:25 GMT, Adam Sotona wrote: >> Hi, >> During performance optimization work on Class-File API as JDK lambda generator we found some static initialization killers. >> One of them is `java.lang.classfile.Attributes` with tens of static fields initialized with individual attribute mappers, and common set of all mappers, and static map from attribute names to the mappers. >> >> I propose to turn all the static fields into lazy-initialized static methods and remove `PREDEFINED_ATTRIBUTES` and `standardAttribute(Utf8Entry name)` static mapping method from the `Attributes` API class. >> >> Please let me know your comments or objections and please review the [PR](https://github.com/openjdk/jdk/pull/19006) and [CSR](https://bugs.openjdk.org/browse/JDK-8331414), so we can make it into 23. >> >> Thank you, >> Adam > > Adam Sotona has updated the pull request incrementally with one additional commit since the last revision: > > fixed tests Thank you for this! ------------- Marked as reviewed by redestad (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19006#pullrequestreview-2054638558 From epeter at openjdk.org Tue May 14 08:26:05 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 May 2024 08:26:05 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop [v2] In-Reply-To: References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> <_kbcMydcMPblcm_FDDuL5vWGT7q6iRoarmYsTlEA0hQ=.290c6744-211d-406d-8ed1-90e510051167@github.com> Message-ID: On Tue, 14 May 2024 08:08:08 GMT, Roland Westrelin wrote: >> Before split if: >> >> long i = 100; >> for (; i > 0;) { >> // i here is 1..100 >> int j = (int)i; // ConvL2I type is 1..100, same as loop phi >> int k = 42 / j; >> i--; >> } >> >> >> after split if: >> >> >> long i = 100; >> int j = 100; >> int k = 0; >> for (; i > 0;) { >> // i here is 1..100 >> i--; >> // i here is 0..99 >> j = (int)i; // ConvL2I type is still 1..100 which is not correct >> k = 42 / j; >> } > >> @rwestrel which "split_if" optimization was applied in your example? Split the ConvI2L through the phi? If so, the problem seems to be that the ConvI2L floats by the exit-check, right? > > Yes. > >> So I guess that is really a limitation: a trip count `Phi` specifically does the narrowing, and so you cannot just split past it. The question is if that is really nice, or if we could do it differently, e.g. via a `CastLL/CastII` on the exit-check? > > The issue involves conv nodes when split thru phi at a counted loop. That's a narrow corner case. I think fixing it by addressing the corner case where it occurs as proposed is simpler than trying a most general fix which can have hard to anticipate consequences. @rwestrel Yes, I'm totally fine with the fix. It simply applies the `int` case to `long`. In a future RFE, we could at least restrict the "bailout" to trip-count Phi's, and not all Phi's. In even further RFE's, we could consider doing the type narrowing not in the trip-count phi, but via casts at the checks. That would be a more unified solution. Generally, I feel like we are struggling way too much with all the different ways one can pin and narrow types: it is all mixed into trip-count phi's, Cast's, Conv's etc. Who really can understand all the complicated interactions? It seem we keep piling on special-case logic, but it is a endless whack-a-mole game. Every fix is "simple" but the sum of all those fixes is far from "simple" ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19086#issuecomment-2109575708 From amitkumar at openjdk.org Tue May 14 08:31:13 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 14 May 2024 08:31:13 GMT Subject: RFR: 8331934: [s390x] Add support for primitive array C1 clone intrinsic Message-ID: Adds JDK-8302850 Port for s390x. Testing: make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR jtreg:test/hotspot/jtreg:hotspot_compiler 1166 1166 0 0 ============================== TEST SUCCESS * Tier1 Test with Fast debug build. BenchMarking: Without Patch: Benchmark (size) Mode Cnt Score Error Units ArrayClone.byteArraycopy 0 avgt 15 10.838 ? 0.461 ns/op ArrayClone.byteArraycopy 10 avgt 15 28.919 ? 1.695 ns/op ArrayClone.byteArraycopy 100 avgt 15 48.815 ? 0.901 ns/op ArrayClone.byteArraycopy 1000 avgt 15 256.357 ? 7.901 ns/op ArrayClone.byteClone 0 avgt 15 90.398 ? 3.119 ns/op ArrayClone.byteClone 10 avgt 15 103.774 ? 4.468 ns/op ArrayClone.byteClone 100 avgt 15 126.628 ? 6.952 ns/op ArrayClone.byteClone 1000 avgt 15 326.409 ? 31.635 ns/op ArrayClone.intArraycopy 0 avgt 15 10.450 ? 0.509 ns/op ArrayClone.intArraycopy 10 avgt 15 36.903 ? 0.753 ns/op ArrayClone.intArraycopy 100 avgt 15 85.964 ? 1.806 ns/op ArrayClone.intArraycopy 1000 avgt 15 841.512 ? 40.335 ns/op ArrayClone.intClone 0 avgt 15 89.332 ? 3.695 ns/op ArrayClone.intClone 10 avgt 15 110.639 ? 2.476 ns/op ArrayClone.intClone 100 avgt 15 195.781 ? 8.622 ns/op ArrayClone.intClone 1000 avgt 15 1058.479 ? 92.468 ns/op Finished running test 'micro:java.lang.ArrayClone' with patch: Benchmark (size) Mode Cnt Score Error Units ArrayClone.byteArraycopy 0 avgt 15 10.526 ? 0.289 ns/op ArrayClone.byteArraycopy 10 avgt 15 27.110 ? 0.656 ns/op ArrayClone.byteArraycopy 100 avgt 15 49.872 ? 1.562 ns/op ArrayClone.byteArraycopy 1000 avgt 15 269.518 ? 4.567 ns/op ArrayClone.byteClone 0 avgt 15 10.766 ? 0.899 ns/op ArrayClone.byteClone 10 avgt 15 18.341 ? 0.394 ns/op ArrayClone.byteClone 100 avgt 15 40.986 ? 0.674 ns/op ArrayClone.byteClone 1000 avgt 15 227.512 ? 7.643 ns/op ArrayClone.intArraycopy 0 avgt 15 10.320 ? 0.294 ns/op ArrayClone.intArraycopy 10 avgt 15 36.557 ? 0.860 ns/op ArrayClone.intArraycopy 100 avgt 15 89.837 ? 2.364 ns/op ArrayClone.intArraycopy 1000 avgt 15 836.678 ? 27.920 ns/op ArrayClone.intClone 0 avgt 15 10.043 ? 0.216 ns/op ArrayClone.intClone 10 avgt 15 29.149 ? 0.723 ns/op ArrayClone.intClone 100 avgt 15 88.046 ? 2.211 ns/op ArrayClone.intClone 1000 avgt 15 840.163 ? 58.748 ns/op Finished running test 'micro:java.lang.ArrayClone' ------------- Depends on: https://git.openjdk.org/jdk/pull/17667 Commit messages: - s390x Port Changes: https://git.openjdk.org/jdk/pull/19220/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19220&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331934 Stats: 47 lines in 6 files changed: 23 ins; 2 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/19220.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19220/head:pull/19220 PR: https://git.openjdk.org/jdk/pull/19220 From amitkumar at openjdk.org Tue May 14 08:35:05 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 14 May 2024 08:35:05 GMT Subject: RFR: 8331934: [s390x] Add support for primitive array C1 clone intrinsic In-Reply-To: References: Message-ID: On Mon, 13 May 2024 17:08:03 GMT, Amit Kumar wrote: > Adds JDK-8302850 Port for s390x. > > Testing: > > make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:hotspot_compiler 1166 1166 0 0 > ============================== > TEST SUCCESS > > * Tier1 Test with Fast debug build. > > BenchMarking: > > > Without Patch: > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 10.838 ? 0.461 ns/op > ArrayClone.byteArraycopy 10 avgt 15 28.919 ? 1.695 ns/op > ArrayClone.byteArraycopy 100 avgt 15 48.815 ? 0.901 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 256.357 ? 7.901 ns/op > ArrayClone.byteClone 0 avgt 15 90.398 ? 3.119 ns/op > ArrayClone.byteClone 10 avgt 15 103.774 ? 4.468 ns/op > ArrayClone.byteClone 100 avgt 15 126.628 ? 6.952 ns/op > ArrayClone.byteClone 1000 avgt 15 326.409 ? 31.635 ns/op > ArrayClone.intArraycopy 0 avgt 15 10.450 ? 0.509 ns/op > ArrayClone.intArraycopy 10 avgt 15 36.903 ? 0.753 ns/op > ArrayClone.intArraycopy 100 avgt 15 85.964 ? 1.806 ns/op > ArrayClone.intArraycopy 1000 avgt 15 841.512 ? 40.335 ns/op > ArrayClone.intClone 0 avgt 15 89.332 ? 3.695 ns/op > ArrayClone.intClone 10 avgt 15 110.639 ? 2.476 ns/op > ArrayClone.intClone 100 avgt 15 195.781 ? 8.622 ns/op > ArrayClone.intClone 1000 avgt 15 1058.479 ? 92.468 ns/op > Finished running test 'micro:java.lang.ArrayClone' > > > with patch: > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 10.526 ? 0.289 ns/op > ArrayClone.byteArraycopy 10 avgt 15 27.110 ? 0.656 ns/op > Arra... @RealLucy @TheRealMDoerr Would you please review this one. :-) Testing seems clear on s390x. I have posted Benchmark result as well. Please let me know if any further testing is required. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19220#issuecomment-2109594090 From luhenry at openjdk.org Tue May 14 08:46:04 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 14 May 2024 08:46:04 GMT Subject: RFR: 8332130: RISC-V: remove wrong instructions of Vector Crypto Extension In-Reply-To: References: Message-ID: On Tue, 14 May 2024 07:37:39 GMT, Fei Yang wrote: >>> I think you mean the `funct3` (`OPIVV` vs `OPIVX`) encoding is wrong? >> >> Yes > >> > I think you mean the `funct3` (`OPIVV` vs `OPIVX`) encoding is wrong? >> >> Yes > > From the RVV spec [1], the `funct3` encoding for `OPIVX` is 0b100, which is also reflected on the instruction encoding. > So why would you think it's wrong? Anything I missed? > > [1] https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-arithmetic-instruction-formats @RealFYang the `.vx` variant expect a **scalar** register while our `vandn_vx` takes a **vector** register. If we had a use for `vandn_vx` (or any of the other removed instructions), we would need to add another section with #define INSN(NAME, op, funct3, funct6) \ void NAME(VectorRegister Vd, VectorRegister Vs2, Register Rs1, VectorMask vm = unmasked) { \ patch_VArith(op, Vd, funct3, Rs1->raw_encoding(), Vs2, vm, funct6); \ } But given we have no use for these instructions, I'm ok with removing them. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19211#issuecomment-2109617260 From fyang at openjdk.org Tue May 14 10:04:02 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 14 May 2024 10:04:02 GMT Subject: RFR: 8332130: RISC-V: remove wrong instructions of Vector Crypto Extension In-Reply-To: References: Message-ID: On Mon, 13 May 2024 08:14:43 GMT, Hamlin Li wrote: > Hi, > Can you help to reivew this simple patch to remove some wrong instrunctions on riscv? > These instrunctions are wrong in that e.g. take `vror.vx` as example, > * by definition of spec, it should be `vror.vx vd, vs2, *rs1*, vm` > * the implementation here, it is indeed `vror_vx(VectorRegister Vd, VectorRegister Vs2, *VectorRegister* Vs1, VectorMask vm = unmasked)` > > Thanks Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19211#pullrequestreview-2054887463 From fyang at openjdk.org Tue May 14 10:04:03 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 14 May 2024 10:04:03 GMT Subject: RFR: 8332130: RISC-V: remove wrong instructions of Vector Crypto Extension In-Reply-To: References: Message-ID: On Tue, 14 May 2024 07:37:39 GMT, Fei Yang wrote: >>> I think you mean the `funct3` (`OPIVV` vs `OPIVX`) encoding is wrong? >> >> Yes > >> > I think you mean the `funct3` (`OPIVV` vs `OPIVX`) encoding is wrong? >> >> Yes > > From the RVV spec [1], the `funct3` encoding for `OPIVX` is 0b100, which is also reflected on the instruction encoding. > So why would you think it's wrong? Anything I missed? > > [1] https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-arithmetic-instruction-formats > @RealFYang the `.vx` variant expect a **scalar** register while our `vandn_vx` takes a **vector** register. If we had a use for `vandn_vx` (or any of the other removed instructions), we would need to add another section with > > ``` > #define INSN(NAME, op, funct3, funct6) \ > void NAME(VectorRegister Vd, VectorRegister Vs2, Register Rs1, VectorMask vm = unmasked) { \ > patch_VArith(op, Vd, funct3, Rs1->raw_encoding(), Vs2, vm, funct6); \ > } > ``` > > But given we have no use for these instructions, I'm ok with removing them. Ah, I see. Looks good. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19211#issuecomment-2109789006 From dchuyko at openjdk.org Tue May 14 10:48:04 2024 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Tue, 14 May 2024 10:48:04 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 21:08:08 GMT, Evgeny Astigeevich wrote: >> Backout of [JDK-8309271](https://bugs.openjdk.org/browse/JDK-8309271) which has known bugs, possible bugs and performance issues. REDO work is tracked by [JDK-8331749](https://bugs.openjdk.org/browse/JDK-8331749). >> >> Found bugs: >> - When refreshing `CompilerDirectivesAddDCmd::execute` will call `DirectivesStack::hasMatchingDirectives(mh, true)` which only considers the compiler directive which is on the top of the directives stack. As more than one directive can be added, `CompilerDirectivesAddDCmd::execute` will not behave as expected. >> - A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. >> >> There are other concerns: bugs and performance issues. >> >> Possible bugs: >> - `has_matching_directives` might not be cleared. A nmethod might get into the unloading state before `CodeCache::recompile_marked_directives_matches`. If the nmethod has been used to mark a Java method and it is the only nmethod, there will be no nmethod in CodeCache to reach the Java method to clear the mark. >> - A Java method might have been compiled with new directives before `CodeCache::recompile_marked_directives_matches`. `CodeCache::recompile_marked_directives_matches` will recompile it again. >> - JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. >> >> Performance issues: >> - Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, `CodeCache::recompile_marked_directives_matches` will be traversing nmethods most of which don't need recompilation. >> >> The backout is not clean because of removal of `CompiledMethod`. >> >> Tested with release and fastdebug builds: tier1 and tier2 passed. > > What if instead of backing out we will use an experimental JVM flag: `XX:+CompilerDirectivesRefreshSupport`? > I agree with this backout. Thank you @eastig for explaining your point. We have about 3 weeks before RDP1 and it is better we have less issues before that. Let redo implementation in next release taking into account the issues you found and have more time for testing. OK. I hope it takes less time to get back into the source tree than it did initially. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2109874596 From mli at openjdk.org Tue May 14 11:30:08 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 14 May 2024 11:30:08 GMT Subject: RFR: 8332130: RISC-V: remove wrong instructions of Vector Crypto Extension In-Reply-To: References: Message-ID: On Mon, 13 May 2024 08:14:43 GMT, Hamlin Li wrote: > Hi, > Can you help to reivew this simple patch to remove some wrong instrunctions on riscv? > These instrunctions are wrong in that e.g. take `vror.vx` as example, > * by definition of spec, it should be `vror.vx vd, vs2, *rs1*, vm` > * the implementation here, it is indeed `vror_vx(VectorRegister Vd, VectorRegister Vs2, *VectorRegister* Vs1, VectorMask vm = unmasked)` > > Thanks Sorry for misleading. Thanks @luhenry @RealFYang for your reviewing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19211#issuecomment-2109955129 From mli at openjdk.org Tue May 14 11:30:09 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 14 May 2024 11:30:09 GMT Subject: Integrated: 8332130: RISC-V: remove wrong instructions of Vector Crypto Extension In-Reply-To: References: Message-ID: <4MfiGaorr01EQssf26w0dXY6brY2JZ5RDOAbQ3Kzwds=.67b89bbd-19cb-452b-96ea-138e1a1995ab@github.com> On Mon, 13 May 2024 08:14:43 GMT, Hamlin Li wrote: > Hi, > Can you help to reivew this simple patch to remove some wrong instrunctions on riscv? > These instrunctions are wrong in that e.g. take `vror.vx` as example, > * by definition of spec, it should be `vror.vx vd, vs2, *rs1*, vm` > * the implementation here, it is indeed `vror_vx(VectorRegister Vd, VectorRegister Vs2, *VectorRegister* Vs1, VectorMask vm = unmasked)` > > Thanks This pull request has now been integrated. Changeset: 7ce4a13c Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/7ce4a13c0a891e606480e138f4025ffa328a18b3 Stats: 5 lines in 1 file changed: 0 ins; 5 del; 0 mod 8332130: RISC-V: remove wrong instructions of Vector Crypto Extension Reviewed-by: luhenry, fyang ------------- PR: https://git.openjdk.org/jdk/pull/19211 From amitkumar at openjdk.org Tue May 14 13:02:11 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 14 May 2024 13:02:11 GMT Subject: RFR: 8331311: C2: Big Endian Port of 8318446: optimize stores into primitive arrays by combining values into larger store In-Reply-To: References: Message-ID: On Mon, 13 May 2024 15:58:31 GMT, Richard Reingruber wrote: >> This pr adds a few tweaks to [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446) which allows enabling it also on big endian platforms (e.g. AIX, S390). JDK-8318446 introduced a C2 optimization to replace consecutive stores to a primitive array with just one store. >> >> By example (from `TestMergeStores.java`): >> >> >> static Object[] test2a(byte[] a, int offset, long v) { >> if (IS_BIG_ENDIAN) { >> a[offset + 0] = (byte)(v >> 56); >> a[offset + 1] = (byte)(v >> 48); >> a[offset + 2] = (byte)(v >> 40); >> a[offset + 3] = (byte)(v >> 32); >> a[offset + 4] = (byte)(v >> 24); >> a[offset + 5] = (byte)(v >> 16); >> a[offset + 6] = (byte)(v >> 8); >> a[offset + 7] = (byte)(v >> 0); >> } else { >> a[offset + 0] = (byte)(v >> 0); >> a[offset + 1] = (byte)(v >> 8); >> a[offset + 2] = (byte)(v >> 16); >> a[offset + 3] = (byte)(v >> 24); >> a[offset + 4] = (byte)(v >> 32); >> a[offset + 5] = (byte)(v >> 40); >> a[offset + 6] = (byte)(v >> 48); >> a[offset + 7] = (byte)(v >> 56); >> } >> return new Object[]{ a }; >> } >> >> >> Depending on the endianess 8 bytes are stored into an array. The order of the stores is the same as the order of an 8-byte-store therefore 8 1-byte-stores can be replaced with just one 8-byte-store (if there aren't too many range checks). >> >> Additionally I've fixed a few comments and a test bug. >> >> The optimization seems to be a little bit more effective on big endian platforms. >> >> Again by example: >> >> >> static Object[] test800a(byte[] a, int offset, long v) { >> if (IS_BIG_ENDIAN) { >> a[offset + 0] = (byte)(v >> 40); // Removed from candidate list >> a[offset + 1] = (byte)(v >> 32); // Removed from candidate list >> a[offset + 2] = (byte)(v >> 24); // Merged >> a[offset + 3] = (byte)(v >> 16); // Merged >> a[offset + 4] = (byte)(v >> 8); // Merged >> a[offset + 5] = (byte)(v >> 0); // Merged >> } else { >> a[offset + 0] = (byte)(v >> 0); // Removed from candidate list >> a[offset + 1] = (byte)(v >> 8); // Removed from candidate list >> a[offset + 2] = (byte)(v >> 16); // Not merged >> a[offset + 3] = (byte)(v >> 24); // Not merged >> a[offset + 4] = (byte)(v >> 32); // Not merge... > > @offamitkumar you can put this through your testing if you like. It should solve the issues with test/hotspot/jtreg/compiler/c2/TestMergeStores.java also for s390. @reinrich test is passing on s390x with your change. tier1 test are in progress. Update: tier1 test are also clean on s390x; ------------- PR Comment: https://git.openjdk.org/jdk/pull/19218#issuecomment-2108186692 From rrich at openjdk.org Tue May 14 13:02:11 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 14 May 2024 13:02:11 GMT Subject: RFR: 8331311: C2: Big Endian Port of 8318446: optimize stores into primitive arrays by combining values into larger store Message-ID: This pr adds a few tweaks to [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446) which allows enabling it also on big endian platforms (e.g. AIX, S390). JDK-8318446 introduced a C2 optimization to replace consecutive stores to a primitive array with just one store. By example (from `TestMergeStores.java`): static Object[] test2a(byte[] a, int offset, long v) { if (IS_BIG_ENDIAN) { a[offset + 0] = (byte)(v >> 56); a[offset + 1] = (byte)(v >> 48); a[offset + 2] = (byte)(v >> 40); a[offset + 3] = (byte)(v >> 32); a[offset + 4] = (byte)(v >> 24); a[offset + 5] = (byte)(v >> 16); a[offset + 6] = (byte)(v >> 8); a[offset + 7] = (byte)(v >> 0); } else { a[offset + 0] = (byte)(v >> 0); a[offset + 1] = (byte)(v >> 8); a[offset + 2] = (byte)(v >> 16); a[offset + 3] = (byte)(v >> 24); a[offset + 4] = (byte)(v >> 32); a[offset + 5] = (byte)(v >> 40); a[offset + 6] = (byte)(v >> 48); a[offset + 7] = (byte)(v >> 56); } return new Object[]{ a }; } Depending on the endianess 8 bytes are stored into an array. The order of the stores is the same as the order of an 8-byte-store therefore 8 1-byte-stores can be replaced with just one 8-byte-store (if there aren't too many range checks). Additionally I've fixed a few comments and a test bug. The optimization seems to be a little bit more effective on big endian platforms. Again by example: static Object[] test800a(byte[] a, int offset, long v) { if (IS_BIG_ENDIAN) { a[offset + 0] = (byte)(v >> 40); // Removed from candidate list a[offset + 1] = (byte)(v >> 32); // Removed from candidate list a[offset + 2] = (byte)(v >> 24); // Merged a[offset + 3] = (byte)(v >> 16); // Merged a[offset + 4] = (byte)(v >> 8); // Merged a[offset + 5] = (byte)(v >> 0); // Merged } else { a[offset + 0] = (byte)(v >> 0); // Removed from candidate list a[offset + 1] = (byte)(v >> 8); // Removed from candidate list a[offset + 2] = (byte)(v >> 16); // Not merged a[offset + 3] = (byte)(v >> 24); // Not merged a[offset + 4] = (byte)(v >> 32); // Not merged a[offset + 5] = (byte)(v >> 40); // Not merged } return new Object[]{ a }; } The sequence of candidate stores begins at the lowest store (in Memory def-use order) and is trimmed to a power of 2 removing higher stores if necessary. On little endian platforms this removes the least significant bytes to be stored. Therefore the remaining stores cannot be merged since this would require a right shift. On big endian platforms the stores of the more significant bytes are removed and the remaining stores can be merged. I introduced new platform attributes `little-endian`, `big-endian` to the IR testing framework to be able to adapt IR matching rules to this difference. Testing: `TestMergeStores.java` on AIX and S390. JTReg tests: tier1-4 of hotspot and jdk. All of Langtools and jaxp. JCK, SPECjvm2008, SPECjbb2015, Renaissance Suite, and SAP specific tests. Testing was done with fastdebug builds on the main platforms and also on Linux/PPC64le and AIX. ------------- Commit messages: - Improve comment - Add bug id - Typo - 8331311: C2: Big Endian Port of 8318446: optimize stores into primitive arrays by combining values into larger store Changes: https://git.openjdk.org/jdk/pull/19218/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19218&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331311 Stats: 572 lines in 3 files changed: 378 ins; 3 del; 191 mod Patch: https://git.openjdk.org/jdk/pull/19218.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19218/head:pull/19218 PR: https://git.openjdk.org/jdk/pull/19218 From rrich at openjdk.org Tue May 14 13:02:11 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 14 May 2024 13:02:11 GMT Subject: RFR: 8331311: C2: Big Endian Port of 8318446: optimize stores into primitive arrays by combining values into larger store In-Reply-To: References: Message-ID: On Mon, 13 May 2024 15:53:52 GMT, Richard Reingruber wrote: > This pr adds a few tweaks to [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446) which allows enabling it also on big endian platforms (e.g. AIX, S390). JDK-8318446 introduced a C2 optimization to replace consecutive stores to a primitive array with just one store. > > By example (from `TestMergeStores.java`): > > > static Object[] test2a(byte[] a, int offset, long v) { > if (IS_BIG_ENDIAN) { > a[offset + 0] = (byte)(v >> 56); > a[offset + 1] = (byte)(v >> 48); > a[offset + 2] = (byte)(v >> 40); > a[offset + 3] = (byte)(v >> 32); > a[offset + 4] = (byte)(v >> 24); > a[offset + 5] = (byte)(v >> 16); > a[offset + 6] = (byte)(v >> 8); > a[offset + 7] = (byte)(v >> 0); > } else { > a[offset + 0] = (byte)(v >> 0); > a[offset + 1] = (byte)(v >> 8); > a[offset + 2] = (byte)(v >> 16); > a[offset + 3] = (byte)(v >> 24); > a[offset + 4] = (byte)(v >> 32); > a[offset + 5] = (byte)(v >> 40); > a[offset + 6] = (byte)(v >> 48); > a[offset + 7] = (byte)(v >> 56); > } > return new Object[]{ a }; > } > > > Depending on the endianess 8 bytes are stored into an array. The order of the stores is the same as the order of an 8-byte-store therefore 8 1-byte-stores can be replaced with just one 8-byte-store (if there aren't too many range checks). > > Additionally I've fixed a few comments and a test bug. > > The optimization seems to be a little bit more effective on big endian platforms. > > Again by example: > > > static Object[] test800a(byte[] a, int offset, long v) { > if (IS_BIG_ENDIAN) { > a[offset + 0] = (byte)(v >> 40); // Removed from candidate list > a[offset + 1] = (byte)(v >> 32); // Removed from candidate list > a[offset + 2] = (byte)(v >> 24); // Merged > a[offset + 3] = (byte)(v >> 16); // Merged > a[offset + 4] = (byte)(v >> 8); // Merged > a[offset + 5] = (byte)(v >> 0); // Merged > } else { > a[offset + 0] = (byte)(v >> 0); // Removed from candidate list > a[offset + 1] = (byte)(v >> 8); // Removed from candidate list > a[offset + 2] = (byte)(v >> 16); // Not merged > a[offset + 3] = (byte)(v >> 24); // Not merged > a[offset + 4] = (byte)(v >> 32); // Not merged > a[offset + 5] = (byte)(v >> 40); // Not merged > } > return new Object[]{ a };... @offamitkumar you can put this through your testing if you like. It should solve the issues with test/hotspot/jtreg/compiler/c2/TestMergeStores.java also for s390. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19218#issuecomment-2108093968 From pminborg at openjdk.org Tue May 14 14:14:17 2024 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 14 May 2024 14:14:17 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) Message-ID: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> # Stable Values & Collections (Internal) ## Summary This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. ## Goals * Provide an easy and intuitive API to describe value holders that can change at most once. * Decouple declaration from initialization without significant footprint or performance penalties. * Reduce the amount of static initializer and/or field initialization code. * Uphold integrity and consistency, even in a multi-threaded environment. For more details, see the draft JEP: https://openjdk.org/jeps/8312611 ## Performance Performance compared to instance variables using an `AtomicReference` and one protected by double-checked locking under concurrent access by 8 threads: Benchmark Mode Cnt Score Error Units StableBenchmark.instanceAtomic avgt 10 1.576 ? 0.052 ns/op StableBenchmark.instanceDCL avgt 10 1.608 ? 0.059 ns/op StableBenchmark.instanceStable avgt 10 0.979 ? 0.023 ns/op <- StableValue (~40% faster than DCL) Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (8 threads): Benchmark Mode Cnt Score Error Units StableBenchmark.staticAtomic avgt 10 1.335 ? 0.056 ns/op StableBenchmark.staticCHI avgt 10 0.623 ? 0.086 ns/op StableBenchmark.staticDCL avgt 10 1.418 ? 0.171 ns/op StableBenchmark.staticList avgt 10 0.617 ? 0.024 ns/op StableBenchmark.staticStable avgt 10 0.604 ? 0.022 ns/op <- StableValue ( > 2x faster than `AtomicInteger` and DCL) Performance for stable lists in both instance and static contexts whereby the sum of random contents is calculated for stable lists (which are thread-safe) compared to `ArrayList` instances (which are not thread-safe) (under single thread access): Benchmark Mode Cnt Score Error Units StableListSumBenchmark.instanceArrayList avgt 10 0.356 ? 0.005 ns/op StableListSumBenchmark.instanceList avgt 10 0.373 ? 0.017 ns/op <- Stable list StableListSumBenchmark.staticArrayList avgt 10 0.352 ? 0.002 ns/op StableListSumBenchmark.staticList avgt 10 0.356 ? 0.003 ns/op <- Stable list Performance for stable maps in a static context compared to a `ConcurrentHashMap` (under single thread access): Benchmark Mode Cnt Score Error Units StablePropertiesBenchmark.chmRaw avgt 10 3.416 ? 0.031 ns/op StablePropertiesBenchmark.mapRaw avgt 10 2.105 ? 0.012 ns/op <- Stable map (~40% faster) All figures above are from local tests on a Mac M1 laptop and should only be constructed as indicative figures. ## Implementation details There are some noteworthy implementation details in this PR: * A field is _trusted_ if it is _declared_ as a `final StableValue`. Previously, the determination of trustworthiness was connected to the _class in which it was declared_ (e.g. is it a `record` or a hidden class). In order to grant such trust, there are extra restrictions imposed on reflection and `sun.misc.Unsafe` usage for such declared `StableValue` fields. This is similar to how `record` classes are handled. * In order to allow plain memory semantics for read operations across threads (rather than `volatile` semantics which is slower and which is normally required for double-checked-locking access), we perform a _freeze_ operation before an object becomes visible to other threads. This will prevent store-store reordering and hence, we are able to guarantee complete objects are always seen even under plain memory semantics. * In collections with `StableValue` elements/values, a transient `StableValue` view backed by internal arrays is created upon read operations. This improves initialization time, reduces storage requirements, and improves access performance as these transient objects are eliminated by the C2 compiler. ------------- Commit messages: - Merge branch 'master' into stable-value - Rework the creation of StableEnumMaps - Update sun.misc.Unsafe - Fix error in hash code - Add methods to create generic arrays - Change class types - Add a marker interface TrustedFieldType - Improve array test - Clean up tests - Add tests - ... and 162 more: https://git.openjdk.org/jdk/compare/4ba74475...5d5dcced Changes: https://git.openjdk.org/jdk/pull/18794/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330465 Stats: 5733 lines in 39 files changed: 5708 ins; 13 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/18794.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18794/head:pull/18794 PR: https://git.openjdk.org/jdk/pull/18794 From liach at openjdk.org Tue May 14 14:14:21 2024 From: liach at openjdk.org (Chen Liang) Date: Tue, 14 May 2024 14:14:21 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) In-Reply-To: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Tue, 16 Apr 2024 11:47:23 GMT, Per Minborg wrote: > # Stable Values & Collections (Internal) > > ## Summary > This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. > > ## Goals > * Provide an easy and intuitive API to describe value holders that can change at most once. > * Decouple declaration from initialization without significant footprint or performance penalties. > * Reduce the amount of static initializer and/or field initialization code. > * Uphold integrity and consistency, even in a multi-threaded environment. > > For more details, see the draft JEP: https://openjdk.org/jeps/8312611 > > ## Performance > Performance compared to instance variables using an `AtomicReference` and one protected by double-checked locking under concurrent access by 8 threads: > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.instanceAtomic avgt 10 1.576 ? 0.052 ns/op > StableBenchmark.instanceDCL avgt 10 1.608 ? 0.059 ns/op > StableBenchmark.instanceStable avgt 10 0.979 ? 0.023 ns/op <- StableValue (~40% faster than DCL) > > > Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (8 threads): > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.staticAtomic avgt 10 1.335 ? 0.056 ns/op > StableBenchmark.staticCHI avgt 10 0.623 ? 0.086 ns/op > StableBenchmark.staticDCL avgt 10 1.418 ? 0.171 ns/op > StableBenchmark.staticList avgt 10 0.617 ? 0.024 ns/op > StableBenchmark.staticStable avgt 10 0.604 ? 0.022 ns/op <- StableValue ( > 2x faster than `AtomicInteger` and DCL) > > > Performance for stable lists in both instance and static contexts whereby the sum of random contents is calculated for stable lists (which are thread-safe) compared to `ArrayList` instances (which are not thread-safe) (under single thread access): > > > Benchmark Mode Cnt Score Error Units > StableListSumBenchmark.instanceArrayList avgt 10 0.356 ? 0.005 ns/op > StableListSumBenchmark.instanceList avgt 10 0.373 ? 0.017 ns/op <- Stable list > StableListSumBenchmark.staticArrayList avgt 10 0.352 ? 0.002 ns/op > StableListSumBenchmark.staticList avgt 10 0.356 ? 0.00... Glad to see this! Some API design remarks. Also, I want to mention a few important differences between `@Stable` and Stable Values: Patterns: 1. Benign race (does not exist in StableValue API): multiple threads can create an instance and upload, any non-null instance is functionally equivalent so race is ok (seen in most of JDK) 2. compareAndSet (setIfUnset): multiple threads can create instance, only one will succeed in uploading; usually for when the instance computation is cheap but we want single witness. 3. atomic computation (computeIfUnset): only one thread can create instance which will be witnessed by other threads; this pattern ensures correctness and prevents wasteful computation by other threads at the cost of locking and lambda creation. Allocation in objects: `@Stable` field is local to an object but `StableValue` is another object; thus sharing strategy may differ, as stable fields are copied over but StableValue uses a shared cache (which is even better for avoiding redundant computation) Question: 1. Will we ever try to expose the stable benign race model to users? 2. Will we ever try to inline the stable values in object layout like a stable field? Just curious, can you test other samples, like `StableValue>` where the contained `List` is an immutable list from `List.of` factories? I think that would be a meaningful case too. Also on a side note, I just realized there's no equivalent of `@Stable int[]` etc. stable primitive arrays exposed, yet immutable arrays will be useful. Is the Frozen Arrays JEP still active, or will this Stable Values consider expose stable primitive arrays? src/java.base/share/classes/java/lang/reflect/Field.java line 179: > 177: AccessibleObject.checkPermission(); > 178: // Always check if the field is a final StableValue > 179: if (StableValue.class.isAssignableFrom(type) && Modifier.isFinal(modifiers)) { This doesn't protect the Stable Collections. src/java.base/share/classes/java/util/ImmutableCollections.java line 173: > 171: .map(Objects::requireNonNull) > 172: .toArray(); > 173: return keys instanceof EnumSet We can move this instanceof check before the stream call. src/java.base/share/classes/java/util/ImmutableCollections.java line 1457: > 1455: private final V[] elements; > 1456: @Stable > 1457: private final AuxiliaryArrays aux; Is java.util not trusted package so we need `@Stable`? src/java.base/share/classes/java/util/ImmutableCollections.java line 1519: > 1517: // Internal interface used to indicate the presence of > 1518: // the computeIfUnset method that is unique to StableMap and StableEnumMap > 1519: interface HasComputeIfUnset { Suggestion: interface HasComputeIfUnset extends Map> { So maybe we can use pattern matching like: Map> map = ... if (map instanceof HasComputeIfUnset hciu) { // stuff } src/java.base/share/classes/java/util/ImmutableCollections.java line 1668: > 1666: @Override > 1667: public Set>> entrySet() { > 1668: return new AbstractSet<>() { Maybe we want to do `AbstractImmutableSet` like in #18522. src/java.base/share/classes/java/util/ImmutableCollections.java line 1677: > 1675: static final class StableEnumMap, V> > 1676: extends AbstractImmutableMap> > 1677: implements Map>, HasComputeIfUnset { Note that this might be a navigable map, as enums are comparable. src/java.base/share/classes/java/util/ImmutableCollections.java line 1855: > 1853: @Override > 1854: public boolean equals(Object o) { > 1855: return o == this; These implementations are violations to the Set contracts; Set's hash code should be its elements' sum (thus an entry set's hash code is equivalent to its map's hash) and equals should check if all elements are present. This also makes two entry sets from two `entrySet()` calls not equal (at least before valhalla) src/java.base/share/classes/jdk/internal/lang/StableValue.java line 223: > 221: /** > 222: * {@return an unmodifiable, shallowly immutable, thread-safe, value-stable, > 223: * {@linkplain Map } where the {@linkplain java.util.Map#keySet() keys} Suggestion: * {@linkplain Map} where the {@linkplain java.util.Map#keySet() keys} src/java.base/share/classes/jdk/internal/lang/StableValue.java line 279: > 277: static V computeIfUnset(List> list, > 278: int index, > 279: IntFunction mapper) { Hmm, these APIs seem unintuitive and error-prone to users. Have you studied the use case where for one stable list/map, there are distinct initialization logics for different indices/keys so that they support different mappers for the same list/map? I cannot recall on top of my head. If we drop said ability and restrict mappers to the list/map creation, the whole thing will be much cleaner, and it's a better way to avoid capturing lambdas as well. Users can still go to individual stable values and use functional creation if they really, really want that functionality. src/java.base/share/classes/jdk/internal/lang/stable/StableUtil.java line 90: > 88: // to provide protection against store/store reordering. > 89: // See VarHandle::releaseFence > 90: UNSAFE.storeFence(); Can we use a storeStoreFence like in #18505? ------------- PR Review: https://git.openjdk.org/jdk/pull/18794#pullrequestreview-2004581002 PR Comment: https://git.openjdk.org/jdk/pull/18794#issuecomment-2061350087 PR Comment: https://git.openjdk.org/jdk/pull/18794#issuecomment-2075216073 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1581828312 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1581828721 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1598676330 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1581829255 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1581829489 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1567930612 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1599908641 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1581830287 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1567944454 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1581830884 From duke at openjdk.org Tue May 14 14:14:23 2024 From: duke at openjdk.org (ExE Boss) Date: Tue, 14 May 2024 14:14:23 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) In-Reply-To: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Tue, 16 Apr 2024 11:47:23 GMT, Per Minborg wrote: > # Stable Values & Collections (Internal) > > ## Summary > This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. > > ## Goals > * Provide an easy and intuitive API to describe value holders that can change at most once. > * Decouple declaration from initialization without significant footprint or performance penalties. > * Reduce the amount of static initializer and/or field initialization code. > * Uphold integrity and consistency, even in a multi-threaded environment. > > For more details, see the draft JEP: https://openjdk.org/jeps/8312611 > > ## Performance > Performance compared to instance variables using an `AtomicReference` and one protected by double-checked locking under concurrent access by 8 threads: > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.instanceAtomic avgt 10 1.576 ? 0.052 ns/op > StableBenchmark.instanceDCL avgt 10 1.608 ? 0.059 ns/op > StableBenchmark.instanceStable avgt 10 0.979 ? 0.023 ns/op <- StableValue (~40% faster than DCL) > > > Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (8 threads): > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.staticAtomic avgt 10 1.335 ? 0.056 ns/op > StableBenchmark.staticCHI avgt 10 0.623 ? 0.086 ns/op > StableBenchmark.staticDCL avgt 10 1.418 ? 0.171 ns/op > StableBenchmark.staticList avgt 10 0.617 ? 0.024 ns/op > StableBenchmark.staticStable avgt 10 0.604 ? 0.022 ns/op <- StableValue ( > 2x faster than `AtomicInteger` and DCL) > > > Performance for stable lists in both instance and static contexts whereby the sum of random contents is calculated for stable lists (which are thread-safe) compared to `ArrayList` instances (which are not thread-safe) (under single thread access): > > > Benchmark Mode Cnt Score Error Units > StableListSumBenchmark.instanceArrayList avgt 10 0.356 ? 0.005 ns/op > StableListSumBenchmark.instanceList avgt 10 0.373 ? 0.017 ns/op <- Stable list > StableListSumBenchmark.staticArrayList avgt 10 0.352 ? 0.002 ns/op > StableListSumBenchmark.staticList avgt 10 0.356 ? 0.00... **Nit:** Inconsistent?whitespace: src/java.base/share/classes/java/lang/reflect/AccessibleObject.java line 393: > 391: } > 392: > 393: InaccessibleObjectException newInaccessibleObjectException(String msg) { This?internal helper?method can?be?`static`: Suggestion: static InaccessibleObjectException newInaccessibleObjectException(String msg) { src/java.base/share/classes/jdk/internal/lang/stable/StableValueElement.java line 57: > 55: return switch (stateVolatile()) { > 56: case UNSET -> throw new NoSuchElementException(); // No value was set > 57: case NON_NULL -> orThrowVolatile(); // Race: another thread has set a value It?should be?safe to?avoid self?recursion?here: Suggestion: case NON_NULL -> elements[index]; // Race: another thread has set a value or: Suggestion: case NON_NULL -> { v = elements[index]; // Race: another thread has set a value if (v != null) { yield v; } throw shouldNotReachHere(); } src/java.base/share/classes/jdk/internal/lang/stable/StableValueElement.java line 63: > 61: // more compact byte code. > 62: switch (stateVolatile()) { > 63: case UNSET: { throw StableUtil.notSet();} Suggestion: case UNSET: { throw StableUtil.notSet(); } src/java.base/share/classes/jdk/internal/lang/stable/StableValueElement.java line 116: > 114: public V computeIfUnset(Supplier supplier) { > 115: // Todo: This creates a lambda > 116: return computeIfUnsetShared(supplier, Supplier::get); Suggestion: return computeIfUnsetShared(supplier, supplierExtractor()); src/java.base/share/classes/jdk/internal/lang/stable/StableValueElement.java line 144: > 142: // more compact byte code. > 143: switch (stateVolatile()) { > 144: case UNSET: { return computeIfUnsetVolatile0(provider, key);} Suggestion: case UNSET: { return computeIfUnsetVolatile0(provider, key); } src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 116: > 114: // more compact byte code. > 115: switch (stateVolatile()) { > 116: case UNSET: { throw StableUtil.notSet();} Suggestion: case UNSET: { throw StableUtil.notSet(); } src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 181: > 179: // more compact byte code. > 180: switch (stateVolatile()) { > 181: case UNSET: { return computeIfUnsetVolatile0(supplier);} Suggestion: case UNSET: { return computeIfUnsetVolatile0(supplier); } ------------- PR Review: https://git.openjdk.org/jdk/pull/18794#pullrequestreview-2029063641 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1570620018 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1580882515 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1583418107 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1572101643 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1583418492 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1583419719 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1583420000 From pminborg at openjdk.org Tue May 14 14:14:23 2024 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 14 May 2024 14:14:23 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: <_mwOb-4riQczNG6UoDy6S4ti7I68dYvcLAY3dzLNZaQ=.71a67ede-0f64-452e-bfec-4db95b25338b@github.com> On Wed, 17 Apr 2024 14:07:05 GMT, Chen Liang wrote: > Question: > > 1. Will we ever try to expose the stable benign race model to users? > 2. Will we ever try to inline the stable values in object layout like a stable field? 1. I think there is little or no upside in exposing the benign race. It would also be difficult to assert the invariant, competing objects are functionally equivalent. So, I think no. 2. In a static context, the stable value will be inlined (or rather constant-folded). So we are partly already there. For pure instance contexts, I have some ideas about this but it is non-trivial. > Just curious, can you test other samples, like `StableValue>` where the contained `List` is an immutable list from `List.of` factories? I think that would be a meaningful case too. Good suggestion. I've added such a test. It turns out the performance is great there too. > Also on a side note, I just realized there's no equivalent of `@Stable int[]` etc. stable primitive arrays exposed, yet immutable arrays will be useful. Is the Frozen Arrays JEP still active, or will this Stable Values consider expose stable primitive arrays? Good question. In one of the previous prototypes, we accepted a class literal that would enable the use of primitive arrays. We now think that we can achieve the same thing once Valhalla is integrated. This will allow not just `StableValue` to use primitive flattened arrays but also a large number of other constructs like `ArrayList`. One thing we are considering is adding support for stable multi-dimensional reference arrays. For an overwhelming majority of the use cases, we would be able to eliminate the second layer of indirection that is there for arrays of rank > 1. > src/java.base/share/classes/java/lang/reflect/Field.java line 179: > >> 177: AccessibleObject.checkPermission(); >> 178: // Always check if the field is a final StableValue >> 179: if (StableValue.class.isAssignableFrom(type) && Modifier.isFinal(modifiers)) { > > This doesn't protect the Stable Collections. I will take a look at having an interface that signals this. > src/java.base/share/classes/java/util/ImmutableCollections.java line 173: > >> 171: .map(Objects::requireNonNull) >> 172: .toArray(); >> 173: return keys instanceof EnumSet > > We can move this instanceof check before the stream call. As we need the array in both cases, how would such a solution look like without duplicating code? > src/java.base/share/classes/java/util/ImmutableCollections.java line 1457: > >> 1455: private final V[] elements; >> 1456: @Stable >> 1457: private final AuxiliaryArrays aux; > > Is java.util not trusted package so we need `@Stable`? That is correct. Hence, there are many @Stable annotations already in this class. > src/java.base/share/classes/java/util/ImmutableCollections.java line 1677: > >> 1675: static final class StableEnumMap, V> >> 1676: extends AbstractImmutableMap> >> 1677: implements Map>, HasComputeIfUnset { > > Note that this might be a navigable map, as enums are comparable. While that is true, no other immutable collection implements a navigable map. The way the API is currently wired, it always returns a `Map`. If we go down this route, it would incidentally return a `NaviableMap` if presented with an `EnumMap` or, we could have a separate factory for enums that states it returns a `NavigableMap`. I think creating all the required views would increase complexity significantly and I am not sure it would be used that much. That said, let us keep this open for the future. > src/java.base/share/classes/java/util/ImmutableCollections.java line 1855: > >> 1853: @Override >> 1854: public boolean equals(Object o) { >> 1855: return o == this; > > These implementations are violations to the Set contracts; Set's hash code should be its elements' sum (thus an entry set's hash code is equivalent to its map's hash) and equals should check if all elements are present. This also makes two entry sets from two `entrySet()` calls not equal (at least before valhalla) Good catch. Thank you for finding this! > src/java.base/share/classes/jdk/internal/lang/StableValue.java line 279: > >> 277: static V computeIfUnset(List> list, >> 278: int index, >> 279: IntFunction mapper) { > > Hmm, these APIs seem unintuitive and error-prone to users. Have you studied the use case where for one stable list/map, there are distinct initialization logics for different indices/keys so that they support different mappers for the same list/map? I cannot recall on top of my head. > > If we drop said ability and restrict mappers to the list/map creation, the whole thing will be much cleaner, and it's a better way to avoid capturing lambdas as well. Users can still go to individual stable values and use functional creation if they really, really want that functionality. I see what you mean with distinct initialization logic. This is not the intended use though. The reason these methods exist is to avoid lambda capturing. Let's say we have a `Function` we want to apply to a `Map>`. Then, retrieving a `stable = StableValue` and applying `stable.computeIfUnset(() -> function.apply(key))` would capture a new `Supplier`. Another alternative would be to write imperative code similar to what is already in these methods. What we could do is provide factories for memorized functions (the latter described in the draft JEP at the end (https://openjdk.org/jeps/8312611) ) even though these are easy to write. I think what you are proposing is something like this? Map> map = StableValue.ofMap(keys, k -> createV(k)); or perhaps even: Map map = StableValue.ofMap(keys, k -> createV(k)); ------------- PR Comment: https://git.openjdk.org/jdk/pull/18794#issuecomment-2061521478 PR Comment: https://git.openjdk.org/jdk/pull/18794#issuecomment-2076524102 PR Comment: https://git.openjdk.org/jdk/pull/18794#issuecomment-2076528648 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1591463541 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1598111344 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1599938827 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1568366235 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1599960795 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1568387185 From liach at openjdk.org Tue May 14 14:14:23 2024 From: liach at openjdk.org (Chen Liang) Date: Tue, 14 May 2024 14:14:23 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) In-Reply-To: <_mwOb-4riQczNG6UoDy6S4ti7I68dYvcLAY3dzLNZaQ=.71a67ede-0f64-452e-bfec-4db95b25338b@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> <_mwOb-4riQczNG6UoDy6S4ti7I68dYvcLAY3dzLNZaQ=.71a67ede-0f64-452e-bfec-4db95b25338b@github.com> Message-ID: On Wed, 17 Apr 2024 15:17:52 GMT, Per Minborg wrote: >> Also, I want to mention a few important differences between `@Stable` and Stable Values: >> >> Patterns: >> 1. Benign race (does not exist in StableValue API): multiple threads can create an instance and upload, any non-null instance is functionally equivalent so race is ok (seen in most of JDK) >> 2. compareAndSet (setIfUnset): multiple threads can create instance, only one will succeed in uploading; usually for when the instance computation is cheap but we want single witness. >> 3. atomic computation (computeIfUnset): only one thread can create instance which will be witnessed by other threads; this pattern ensures correctness and prevents wasteful computation by other threads at the cost of locking and lambda creation. >> >> Allocation in objects: `@Stable` field is local to an object but `StableValue` is another object; thus sharing strategy may differ, as stable fields are copied over but StableValue uses a shared cache (which is even better for avoiding redundant computation) >> >> Question: >> 1. Will we ever try to expose the stable benign race model to users? >> 2. Will we ever try to inline the stable values in object layout like a stable field? > >> Question: >> >> 1. Will we ever try to expose the stable benign race model to users? >> 2. Will we ever try to inline the stable values in object layout like a stable field? > > 1. I think there is little or no upside in exposing the benign race. It would also be difficult to assert the invariant, competing objects are functionally equivalent. So, I think no. > > 2. In a static context, the stable value will be inlined (or rather constant-folded). So we are partly already there. For pure instance contexts, I have some ideas about this but it is non-trivial. @minborg Just curious, why are you adding holder-in-holder benchmark cases? >> src/java.base/share/classes/java/util/ImmutableCollections.java line 173: >> >>> 171: .map(Objects::requireNonNull) >>> 172: .toArray(); >>> 173: return keys instanceof EnumSet >> >> We can move this instanceof check before the stream call. > > As we need the array in both cases, how would such a solution look like without duplicating code? I was thinking about changing the StableEnumMap factory to directly take an EnumSet/BitSet indicating the indices without conversions to full objects and arrays. >> src/java.base/share/classes/java/util/ImmutableCollections.java line 1677: >> >>> 1675: static final class StableEnumMap, V> >>> 1676: extends AbstractImmutableMap> >>> 1677: implements Map>, HasComputeIfUnset { >> >> Note that this might be a navigable map, as enums are comparable. > > While that is true, no other immutable collection implements a navigable map. The way the API is currently wired, it always returns a `Map`. If we go down this route, it would incidentally return a `NaviableMap` if presented with an `EnumMap` or, we could have a separate factory for enums that states it returns a `NavigableMap`. I think creating all the required views would increase complexity significantly and I am not sure it would be used that much. That said, let us keep this open for the future. Fair enough, `Collections` APIs like `unmodifiableSortedMap` explicitly avoid implementing too many interfaces. >> src/java.base/share/classes/jdk/internal/lang/StableValue.java line 279: >> >>> 277: static V computeIfUnset(List> list, >>> 278: int index, >>> 279: IntFunction mapper) { >> >> Hmm, these APIs seem unintuitive and error-prone to users. Have you studied the use case where for one stable list/map, there are distinct initialization logics for different indices/keys so that they support different mappers for the same list/map? I cannot recall on top of my head. >> >> If we drop said ability and restrict mappers to the list/map creation, the whole thing will be much cleaner, and it's a better way to avoid capturing lambdas as well. Users can still go to individual stable values and use functional creation if they really, really want that functionality. > > I see what you mean with distinct initialization logic. This is not the intended use though. > > The reason these methods exist is to avoid lambda capturing. Let's say we have a `Function` we want to apply to a `Map>`. Then, retrieving a `stable = StableValue` and applying `stable.computeIfUnset(() -> function.apply(key))` would capture a new `Supplier`. Another alternative would be to write imperative code similar to what is already in these methods. > > What we could do is provide factories for memorized functions (the latter described in the draft JEP at the end (https://openjdk.org/jeps/8312611) ) even though these are easy to write. > > I think what you are proposing is something like this? > > > Map> map = StableValue.ofMap(keys, k -> createV(k)); > > > or perhaps even: > > > Map map = StableValue.ofMap(keys, k -> createV(k)); Yes, consider the 3 capture scenarios: | API | Capture frequency | Capture Impact | Code Convenience | Flexibility | |-----|-------------------|----------------|------------------|-------------| | `StableValue.ofMap(map, k -> ...)` | By accident | single capture is reused | OK | One generator for all keys | | `StableValue.computeIfUnset(map, key, k -> ...)` | By accident | capture happens for all access sites | somewhat ugly | Different generator for different keys | | `map.get(k).computeIfUnset(() -> ...)` | Always | capture happens for all access sites | OK | Different generator for different keys | Notice the `ofMap` factory is the most tolerant to faulty captures: even if it captures, the single capturing lambda is reused for all map stables, avoiding capture overheads at access sites. Given Java compiler doesn't tell user anything about captures during compilation, I think `ofMap` is a better factory to avoid accidentally writing capturing lambdas. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18794#issuecomment-2075071225 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1598376355 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1568472583 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1568491254 From pminborg at openjdk.org Tue May 14 14:14:23 2024 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 14 May 2024 14:14:23 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) In-Reply-To: <_mwOb-4riQczNG6UoDy6S4ti7I68dYvcLAY3dzLNZaQ=.71a67ede-0f64-452e-bfec-4db95b25338b@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> <_mwOb-4riQczNG6UoDy6S4ti7I68dYvcLAY3dzLNZaQ=.71a67ede-0f64-452e-bfec-4db95b25338b@github.com> Message-ID: On Wed, 17 Apr 2024 15:17:52 GMT, Per Minborg wrote: >> Also, I want to mention a few important differences between `@Stable` and Stable Values: >> >> Patterns: >> 1. Benign race (does not exist in StableValue API): multiple threads can create an instance and upload, any non-null instance is functionally equivalent so race is ok (seen in most of JDK) >> 2. compareAndSet (setIfUnset): multiple threads can create instance, only one will succeed in uploading; usually for when the instance computation is cheap but we want single witness. >> 3. atomic computation (computeIfUnset): only one thread can create instance which will be witnessed by other threads; this pattern ensures correctness and prevents wasteful computation by other threads at the cost of locking and lambda creation. >> >> Allocation in objects: `@Stable` field is local to an object but `StableValue` is another object; thus sharing strategy may differ, as stable fields are copied over but StableValue uses a shared cache (which is even better for avoiding redundant computation) >> >> Question: >> 1. Will we ever try to expose the stable benign race model to users? >> 2. Will we ever try to inline the stable values in object layout like a stable field? > >> Question: >> >> 1. Will we ever try to expose the stable benign race model to users? >> 2. Will we ever try to inline the stable values in object layout like a stable field? > > 1. I think there is little or no upside in exposing the benign race. It would also be difficult to assert the invariant, competing objects are functionally equivalent. So, I think no. > > 2. In a static context, the stable value will be inlined (or rather constant-folded). So we are partly already there. For pure instance contexts, I have some ideas about this but it is non-trivial. > @minborg Just curious, why are you adding holder-in-holder benchmark cases? I'd like to test the transitive constant folding capabilities. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18794#issuecomment-2075119450 From pminborg at openjdk.org Tue May 14 14:14:23 2024 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 14 May 2024 14:14:23 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) In-Reply-To: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Tue, 16 Apr 2024 11:47:23 GMT, Per Minborg wrote: > # Stable Values & Collections (Internal) > > ## Summary > This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. > > ## Goals > * Provide an easy and intuitive API to describe value holders that can change at most once. > * Decouple declaration from initialization without significant footprint or performance penalties. > * Reduce the amount of static initializer and/or field initialization code. > * Uphold integrity and consistency, even in a multi-threaded environment. > > For more details, see the draft JEP: https://openjdk.org/jeps/8312611 > > ## Performance > Performance compared to instance variables using an `AtomicReference` and one protected by double-checked locking under concurrent access by 8 threads: > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.instanceAtomic avgt 10 1.576 ? 0.052 ns/op > StableBenchmark.instanceDCL avgt 10 1.608 ? 0.059 ns/op > StableBenchmark.instanceStable avgt 10 0.979 ? 0.023 ns/op <- StableValue (~40% faster than DCL) > > > Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (8 threads): > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.staticAtomic avgt 10 1.335 ? 0.056 ns/op > StableBenchmark.staticCHI avgt 10 0.623 ? 0.086 ns/op > StableBenchmark.staticDCL avgt 10 1.418 ? 0.171 ns/op > StableBenchmark.staticList avgt 10 0.617 ? 0.024 ns/op > StableBenchmark.staticStable avgt 10 0.604 ? 0.022 ns/op <- StableValue ( > 2x faster than `AtomicInteger` and DCL) > > > Performance for stable lists in both instance and static contexts whereby the sum of random contents is calculated for stable lists (which are thread-safe) compared to `ArrayList` instances (which are not thread-safe) (under single thread access): > > > Benchmark Mode Cnt Score Error Units > StableListSumBenchmark.instanceArrayList avgt 10 0.356 ? 0.005 ns/op > StableListSumBenchmark.instanceList avgt 10 0.373 ? 0.017 ns/op <- Stable list > StableListSumBenchmark.staticArrayList avgt 10 0.352 ? 0.002 ns/op > StableListSumBenchmark.staticList avgt 10 0.356 ? 0.00... I've run some benchmarks on various platforms for static fields (higher is better): image Here are some figures for various platforms where we compare `AtomicReference`, double-checked locking holder, and `StableValue` using instance variables and where we iterate and sum 20 values from said constructs: image Note: The figures should be taken with a grain of salt pending a deeper analysis. src/hotspot/share/ci/ciField.cpp line 262: > 260: const char* stable_array3d_klass_name = "jdk/internal/lang/StableArray3D"; > 261: > 262: static bool trust_final_non_static_fields_of_type(Symbol* signature) { Is there a better way of doing this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18794#issuecomment-2075121059 PR Comment: https://git.openjdk.org/jdk/pull/18794#issuecomment-2076586960 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1599826991 From pminborg at openjdk.org Tue May 14 14:14:23 2024 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 14 May 2024 14:14:23 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Tue, 14 May 2024 11:07:00 GMT, Per Minborg wrote: >> # Stable Values & Collections (Internal) >> >> ## Summary >> This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. >> >> ## Goals >> * Provide an easy and intuitive API to describe value holders that can change at most once. >> * Decouple declaration from initialization without significant footprint or performance penalties. >> * Reduce the amount of static initializer and/or field initialization code. >> * Uphold integrity and consistency, even in a multi-threaded environment. >> >> For more details, see the draft JEP: https://openjdk.org/jeps/8312611 >> >> ## Performance >> Performance compared to instance variables using an `AtomicReference` and one protected by double-checked locking under concurrent access by 8 threads: >> >> >> Benchmark Mode Cnt Score Error Units >> StableBenchmark.instanceAtomic avgt 10 1.576 ? 0.052 ns/op >> StableBenchmark.instanceDCL avgt 10 1.608 ? 0.059 ns/op >> StableBenchmark.instanceStable avgt 10 0.979 ? 0.023 ns/op <- StableValue (~40% faster than DCL) >> >> >> Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (8 threads): >> >> >> Benchmark Mode Cnt Score Error Units >> StableBenchmark.staticAtomic avgt 10 1.335 ? 0.056 ns/op >> StableBenchmark.staticCHI avgt 10 0.623 ? 0.086 ns/op >> StableBenchmark.staticDCL avgt 10 1.418 ? 0.171 ns/op >> StableBenchmark.staticList avgt 10 0.617 ? 0.024 ns/op >> StableBenchmark.staticStable avgt 10 0.604 ? 0.022 ns/op <- StableValue ( > 2x faster than `AtomicInteger` and DCL) >> >> >> Performance for stable lists in both instance and static contexts whereby the sum of random contents is calculated for stable lists (which are thread-safe) compared to `ArrayList` instances (which are not thread-safe) (under single thread access): >> >> >> Benchmark Mode Cnt Score Error Units >> StableListSumBenchmark.instanceArrayList avgt 10 0.356 ? 0.005 ns/op >> StableListSumBenchmark.instanceList avgt 10 0.373 ? 0.017 ns/op <- Stable list >> StableListSumBenchmark.staticArrayList avgt 10 0.352 ? ... > > src/hotspot/share/ci/ciField.cpp line 262: > >> 260: const char* stable_array3d_klass_name = "jdk/internal/lang/StableArray3D"; >> 261: >> 262: static bool trust_final_non_static_fields_of_type(Symbol* signature) { > > Is there a better way of doing this? How do we check if the type implements `TrustedFieldType` in C? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1599836419 From heidinga at openjdk.org Tue May 14 14:14:24 2024 From: heidinga at openjdk.org (Dan Heidinga) Date: Tue, 14 May 2024 14:14:24 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) In-Reply-To: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Tue, 16 Apr 2024 11:47:23 GMT, Per Minborg wrote: > # Stable Values & Collections (Internal) > > ## Summary > This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. > > ## Goals > * Provide an easy and intuitive API to describe value holders that can change at most once. > * Decouple declaration from initialization without significant footprint or performance penalties. > * Reduce the amount of static initializer and/or field initialization code. > * Uphold integrity and consistency, even in a multi-threaded environment. > > For more details, see the draft JEP: https://openjdk.org/jeps/8312611 > > ## Performance > Performance compared to instance variables using an `AtomicReference` and one protected by double-checked locking under concurrent access by 8 threads: > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.instanceAtomic avgt 10 1.576 ? 0.052 ns/op > StableBenchmark.instanceDCL avgt 10 1.608 ? 0.059 ns/op > StableBenchmark.instanceStable avgt 10 0.979 ? 0.023 ns/op <- StableValue (~40% faster than DCL) > > > Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (8 threads): > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.staticAtomic avgt 10 1.335 ? 0.056 ns/op > StableBenchmark.staticCHI avgt 10 0.623 ? 0.086 ns/op > StableBenchmark.staticDCL avgt 10 1.418 ? 0.171 ns/op > StableBenchmark.staticList avgt 10 0.617 ? 0.024 ns/op > StableBenchmark.staticStable avgt 10 0.604 ? 0.022 ns/op <- StableValue ( > 2x faster than `AtomicInteger` and DCL) > > > Performance for stable lists in both instance and static contexts whereby the sum of random contents is calculated for stable lists (which are thread-safe) compared to `ArrayList` instances (which are not thread-safe) (under single thread access): > > > Benchmark Mode Cnt Score Error Units > StableListSumBenchmark.instanceArrayList avgt 10 0.356 ? 0.005 ns/op > StableListSumBenchmark.instanceList avgt 10 0.373 ? 0.017 ns/op <- Stable list > StableListSumBenchmark.staticArrayList avgt 10 0.352 ? 0.002 ns/op > StableListSumBenchmark.staticList avgt 10 0.356 ? 0.00... src/java.base/share/classes/java/lang/reflect/AccessibleObject.java line 193: > 191: *
  • final fields declared in a {@linkplain Class#isHidden() hidden class}
  • > 192: *
  • final fields declared in a {@linkplain Class#isRecord() record}
  • > 193: *
  • final fields of type {@linkplain StableValue StableValue}
  • In Valhalla, we've been looking at adding "strict" final fields to support value classes (which must be strongly immutable) which are fields that are unmodifiable. Most of the existing unmodifiable field cases can be covered by "strict" fields. This one can't though so I'm a little saddened to see this list grow. src/java.base/share/classes/java/lang/reflect/Field.java line 179: > 177: AccessibleObject.checkPermission(); > 178: if (flag) { > 179: if (StableValue.class.isAssignableFrom(type) && Modifier.isFinal(modifiers)) { Should this check be done regardless of the value of "flag"? If it failed always when calling ::setAccessible on a StableValue, we'd make it easier to find bugs and the contract for users would be clearer src/java.base/share/classes/java/lang/reflect/Field.java line 181: > 179: if (StableValue.class.isAssignableFrom(type) && Modifier.isFinal(modifiers)) { > 180: throw newInaccessibleObjectException( > 181: "Unable to make field " + this + " accessable: " + Suggestion: "Unable to make field " + this + " accessible: " + src/java.base/share/classes/java/util/ImmutableCollections.java line 183: > 181: K key, > 182: Function mapper) { > 183: if (map instanceof HasComputeIfUnset) { Can we use pattern matching instanceof here? if (map instance HasComputeIfUnset uc) { src/java.base/share/classes/jdk/internal/lang/StableArray.java line 25: > 23: * @since 23 > 24: */ > 25: public sealed interface StableArray Do we have a use case for StableArray beyond those of StableList? src/java.base/share/classes/jdk/internal/lang/StableValue.java line 47: > 45: * An atomic, thread-safe, stable value holder for which the value can be set at most once. > 46: *

    > 47: * Stable values are eligible for constant folding and other optimizations by the JVM. Other values are also eligible for constant folding. Trying to spec in terms of the optimizations that the JVM may do is usually an unstable state. Better to spec in terms of what the user observable behaviour is and leave it at something like "unlocks further JVM optimizations". src/java.base/share/classes/jdk/internal/lang/StableValue.java line 130: > 128: * } else { > 129: * V newValue = supplier.get(); > 130: * stable.setOrThrow(newValue); If ::computeIfUnset allows racy sets, then it isn't equivalent to this code as ::setOrThrow will throw on a race, correct? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1575048707 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1575005661 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1574997174 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1575014234 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1584931870 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1575025174 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1575030995 From pminborg at openjdk.org Tue May 14 14:14:24 2024 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 14 May 2024 14:14:24 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Mon, 22 Apr 2024 16:31:15 GMT, Dan Heidinga wrote: >> # Stable Values & Collections (Internal) >> >> ## Summary >> This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. >> >> ## Goals >> * Provide an easy and intuitive API to describe value holders that can change at most once. >> * Decouple declaration from initialization without significant footprint or performance penalties. >> * Reduce the amount of static initializer and/or field initialization code. >> * Uphold integrity and consistency, even in a multi-threaded environment. >> >> For more details, see the draft JEP: https://openjdk.org/jeps/8312611 >> >> ## Performance >> Performance compared to instance variables using an `AtomicReference` and one protected by double-checked locking under concurrent access by 8 threads: >> >> >> Benchmark Mode Cnt Score Error Units >> StableBenchmark.instanceAtomic avgt 10 1.576 ? 0.052 ns/op >> StableBenchmark.instanceDCL avgt 10 1.608 ? 0.059 ns/op >> StableBenchmark.instanceStable avgt 10 0.979 ? 0.023 ns/op <- StableValue (~40% faster than DCL) >> >> >> Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (8 threads): >> >> >> Benchmark Mode Cnt Score Error Units >> StableBenchmark.staticAtomic avgt 10 1.335 ? 0.056 ns/op >> StableBenchmark.staticCHI avgt 10 0.623 ? 0.086 ns/op >> StableBenchmark.staticDCL avgt 10 1.418 ? 0.171 ns/op >> StableBenchmark.staticList avgt 10 0.617 ? 0.024 ns/op >> StableBenchmark.staticStable avgt 10 0.604 ? 0.022 ns/op <- StableValue ( > 2x faster than `AtomicInteger` and DCL) >> >> >> Performance for stable lists in both instance and static contexts whereby the sum of random contents is calculated for stable lists (which are thread-safe) compared to `ArrayList` instances (which are not thread-safe) (under single thread access): >> >> >> Benchmark Mode Cnt Score Error Units >> StableListSumBenchmark.instanceArrayList avgt 10 0.356 ? 0.005 ns/op >> StableListSumBenchmark.instanceList avgt 10 0.373 ? 0.017 ns/op <- Stable list >> StableListSumBenchmark.staticArrayList avgt 10 0.352 ? ... > > src/java.base/share/classes/java/lang/reflect/AccessibleObject.java line 193: > >> 191: *

  • final fields declared in a {@linkplain Class#isHidden() hidden class}
  • >> 192: *
  • final fields declared in a {@linkplain Class#isRecord() record}
  • >> 193: *
  • final fields of type {@linkplain StableValue StableValue}
  • > > In Valhalla, we've been looking at adding "strict" final fields to support value classes (which must be strongly immutable) which are fields that are unmodifiable. Most of the existing unmodifiable field cases can be covered by "strict" fields. This one can't though so I'm a little saddened to see this list grow. Maybe we could introduce a special marker interface (e.g. `TrustedFieldType`) that signals this behavior. This might only take effect if loaded via the boot loader. > src/java.base/share/classes/java/util/ImmutableCollections.java line 183: > >> 181: K key, >> 182: Function mapper) { >> 183: if (map instanceof HasComputeIfUnset) { > > Can we use pattern matching instanceof here? > > if (map instance HasComputeIfUnset uc) { Good idea. > src/java.base/share/classes/jdk/internal/lang/StableArray.java line 25: > >> 23: * @since 23 >> 24: */ >> 25: public sealed interface StableArray > > Do we have a use case for StableArray beyond those of StableList? I am trying to model multi-dimensional arrays that also provide flattening. Let's see if it becomes useful. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1576158929 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1575929756 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1591465182 From heidinga at openjdk.org Tue May 14 14:14:24 2024 From: heidinga at openjdk.org (Dan Heidinga) Date: Tue, 14 May 2024 14:14:24 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Tue, 23 Apr 2024 12:22:25 GMT, Per Minborg wrote: >> src/java.base/share/classes/java/lang/reflect/AccessibleObject.java line 193: >> >>> 191: *
  • final fields declared in a {@linkplain Class#isHidden() hidden class}
  • >>> 192: *
  • final fields declared in a {@linkplain Class#isRecord() record}
  • >>> 193: *
  • final fields of type {@linkplain StableValue StableValue}
  • >> >> In Valhalla, we've been looking at adding "strict" final fields to support value classes (which must be strongly immutable) which are fields that are unmodifiable. Most of the existing unmodifiable field cases can be covered by "strict" fields. This one can't though so I'm a little saddened to see this list grow. > > Maybe we could introduce a special marker interface (e.g. `TrustedFieldType`) that signals this behavior. This might only take effect if loaded via the boot loader. Thinking on this more, hidden classes & records & value classes can all be dealt with by the introduction of strict fields. Adding a new type - TrustedFieldType - when we'll eventually only have 1 type here - StableValue - seems like an unnecessary tradeoff. If we ever have to add a second type here, then it's probably worth revisiting this idea. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1584921294 From pminborg at openjdk.org Tue May 14 14:14:24 2024 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 14 May 2024 14:14:24 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> <_mwOb-4riQczNG6UoDy6S4ti7I68dYvcLAY3dzLNZaQ=.71a67ede-0f64-452e-bfec-4db95b25338b@github.com> Message-ID: On Mon, 13 May 2024 12:18:28 GMT, Chen Liang wrote: >> As we need the array in both cases, how would such a solution look like without duplicating code? > > I was thinking about changing the StableEnumMap factory to directly take an EnumSet/BitSet indicating the indices without conversions to full objects and arrays. Sounds like a good idea. I will fix this. >> I see what you mean with distinct initialization logic. This is not the intended use though. >> >> The reason these methods exist is to avoid lambda capturing. Let's say we have a `Function` we want to apply to a `Map>`. Then, retrieving a `stable = StableValue` and applying `stable.computeIfUnset(() -> function.apply(key))` would capture a new `Supplier`. Another alternative would be to write imperative code similar to what is already in these methods. >> >> What we could do is provide factories for memorized functions (the latter described in the draft JEP at the end (https://openjdk.org/jeps/8312611) ) even though these are easy to write. >> >> I think what you are proposing is something like this? >> >> >> Map> map = StableValue.ofMap(keys, k -> createV(k)); >> >> >> or perhaps even: >> >> >> Map map = StableValue.ofMap(keys, k -> createV(k)); > > Yes, consider the 3 capture scenarios: > | API | Capture frequency | Capture Impact | Code Convenience | Flexibility | > |-----|-------------------|----------------|------------------|-------------| > | `StableValue.ofMap(map, k -> ...)` | By accident | single capture is reused | OK | One generator for all keys | > | `StableValue.computeIfUnset(map, key, k -> ...)` | By accident | capture happens for all access sites | somewhat ugly | Different generator for different keys | > | `map.get(k).computeIfUnset(() -> ...)` | Always | capture happens for all access sites | OK | Different generator for different keys | > > Notice the `ofMap` factory is the most tolerant to faulty captures: even if it captures, the single capturing lambda is reused for all map stables, avoiding capture overheads at access sites. Given Java compiler doesn't tell user anything about captures during compilation, I think `ofMap` is a better factory to avoid accidentally writing capturing lambdas. I see what you mean. Initially, I thought it would be easy to create memorized functions but it turned out, that was not the case if one wants to retain easy debugability etc. So, I have added a couple of factory methods including this: /** * {@return a new memoized {@linkplain Function } backed by an internal * stable map with the provided {@code inputs} keys where the provided * {@code original} Function will only be invoked at most once per distinct input} * * @param original the original Function to convert to a memoized Function * @param inputs the potential input values to the Function * @param the type of input values * @param the return type of the function */ static Function ofFunction(Set inputs, Function original) { Objects.requireNonNull(inputs); Objects.requireNonNull(original); ... } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1600035236 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1568671241 From pminborg at openjdk.org Tue May 14 14:14:25 2024 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 14 May 2024 14:14:25 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Tue, 23 Apr 2024 09:17:29 GMT, Per Minborg wrote: >> src/java.base/share/classes/java/util/ImmutableCollections.java line 183: >> >>> 181: K key, >>> 182: Function mapper) { >>> 183: if (map instanceof HasComputeIfUnset) { >> >> Can we use pattern matching instanceof here? >> >> if (map instance HasComputeIfUnset uc) { > > Good idea. Ahh. I thought you meant pattern matching in another place (which actually turned out to be a really good idea). Here, however, we also need to get the type parameters correct: image ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1576154547 From liach at openjdk.org Tue May 14 14:14:25 2024 From: liach at openjdk.org (Chen Liang) Date: Tue, 14 May 2024 14:14:25 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Tue, 23 Apr 2024 12:18:53 GMT, Per Minborg wrote: >> Good idea. > > Ahh. I thought you meant pattern matching in another place (which actually turned out to be a really good idea). Here, however, we also need to get the type parameters correct: > > image Would you still need a cast if you declare `HasComputeIfUnset` with `extends Map>`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1581828542 From mcimadamore at openjdk.org Tue May 14 14:14:25 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 14 May 2024 14:14:25 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) In-Reply-To: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Tue, 16 Apr 2024 11:47:23 GMT, Per Minborg wrote: > # Stable Values & Collections (Internal) > > ## Summary > This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. > > ## Goals > * Provide an easy and intuitive API to describe value holders that can change at most once. > * Decouple declaration from initialization without significant footprint or performance penalties. > * Reduce the amount of static initializer and/or field initialization code. > * Uphold integrity and consistency, even in a multi-threaded environment. > > For more details, see the draft JEP: https://openjdk.org/jeps/8312611 > > ## Performance > Performance compared to instance variables using an `AtomicReference` and one protected by double-checked locking under concurrent access by 8 threads: > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.instanceAtomic avgt 10 1.576 ? 0.052 ns/op > StableBenchmark.instanceDCL avgt 10 1.608 ? 0.059 ns/op > StableBenchmark.instanceStable avgt 10 0.979 ? 0.023 ns/op <- StableValue (~40% faster than DCL) > > > Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (8 threads): > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.staticAtomic avgt 10 1.335 ? 0.056 ns/op > StableBenchmark.staticCHI avgt 10 0.623 ? 0.086 ns/op > StableBenchmark.staticDCL avgt 10 1.418 ? 0.171 ns/op > StableBenchmark.staticList avgt 10 0.617 ? 0.024 ns/op > StableBenchmark.staticStable avgt 10 0.604 ? 0.022 ns/op <- StableValue ( > 2x faster than `AtomicInteger` and DCL) > > > Performance for stable lists in both instance and static contexts whereby the sum of random contents is calculated for stable lists (which are thread-safe) compared to `ArrayList` instances (which are not thread-safe) (under single thread access): > > > Benchmark Mode Cnt Score Error Units > StableListSumBenchmark.instanceArrayList avgt 10 0.356 ? 0.005 ns/op > StableListSumBenchmark.instanceList avgt 10 0.373 ? 0.017 ns/op <- Stable list > StableListSumBenchmark.staticArrayList avgt 10 0.352 ? 0.002 ns/op > StableListSumBenchmark.staticList avgt 10 0.356 ? 0.00... src/java.base/share/classes/java/util/ImmutableCollections.java line 1505: > 1503: } > 1504: > 1505: static final class StableMap Question: can stable maps be implemented in terms of stable lists? After all, you need a stable backing array - and then you need to have a way to go from a key to an index in the stable array. The logic that does key->index conversion belongs to Map, but after that we should be able to just "delegate" to StableList? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1568940281 From pminborg at openjdk.org Tue May 14 14:14:25 2024 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 14 May 2024 14:14:25 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: <2AUE4xXjgoZVxHkOGorI8NtSE4jQOXsWZc4zxheDGp0=.559a1456-bee2-41e5-bcdc-9ee7056be848@github.com> On Wed, 17 Apr 2024 14:24:59 GMT, Maurizio Cimadamore wrote: >> # Stable Values & Collections (Internal) >> >> ## Summary >> This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. >> >> ## Goals >> * Provide an easy and intuitive API to describe value holders that can change at most once. >> * Decouple declaration from initialization without significant footprint or performance penalties. >> * Reduce the amount of static initializer and/or field initialization code. >> * Uphold integrity and consistency, even in a multi-threaded environment. >> >> For more details, see the draft JEP: https://openjdk.org/jeps/8312611 >> >> ## Performance >> Performance compared to instance variables using an `AtomicReference` and one protected by double-checked locking under concurrent access by 8 threads: >> >> >> Benchmark Mode Cnt Score Error Units >> StableBenchmark.instanceAtomic avgt 10 1.576 ? 0.052 ns/op >> StableBenchmark.instanceDCL avgt 10 1.608 ? 0.059 ns/op >> StableBenchmark.instanceStable avgt 10 0.979 ? 0.023 ns/op <- StableValue (~40% faster than DCL) >> >> >> Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (8 threads): >> >> >> Benchmark Mode Cnt Score Error Units >> StableBenchmark.staticAtomic avgt 10 1.335 ? 0.056 ns/op >> StableBenchmark.staticCHI avgt 10 0.623 ? 0.086 ns/op >> StableBenchmark.staticDCL avgt 10 1.418 ? 0.171 ns/op >> StableBenchmark.staticList avgt 10 0.617 ? 0.024 ns/op >> StableBenchmark.staticStable avgt 10 0.604 ? 0.022 ns/op <- StableValue ( > 2x faster than `AtomicInteger` and DCL) >> >> >> Performance for stable lists in both instance and static contexts whereby the sum of random contents is calculated for stable lists (which are thread-safe) compared to `ArrayList` instances (which are not thread-safe) (under single thread access): >> >> >> Benchmark Mode Cnt Score Error Units >> StableListSumBenchmark.instanceArrayList avgt 10 0.356 ? 0.005 ns/op >> StableListSumBenchmark.instanceList avgt 10 0.373 ? 0.017 ns/op <- Stable list >> StableListSumBenchmark.staticArrayList avgt 10 0.352 ? ... > > src/java.base/share/classes/java/util/ImmutableCollections.java line 1505: > >> 1503: } >> 1504: >> 1505: static final class StableMap > > Question: can stable maps be implemented in terms of stable lists? After all, you need a stable backing array - and then you need to have a way to go from a key to an index in the stable array. The logic that does key->index conversion belongs to Map, but after that we should be able to just "delegate" to StableList? This is partially done but we could pull more on this string and unify the implementations. Incidentally, it is also possible to unify the two implementation classes of `StableValue` so it becomes monomorphic. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1570837425 From liach at openjdk.org Tue May 14 14:14:25 2024 From: liach at openjdk.org (Chen Liang) Date: Tue, 14 May 2024 14:14:25 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: <6Ut30fvUqvy-ld6OsJCsWj8T4ImaRq_bPJrmxSbkD6U=.394965a6-d328-4af3-a07b-8088acc7e88b@github.com> On Mon, 22 Apr 2024 16:16:39 GMT, Dan Heidinga wrote: >> # Stable Values & Collections (Internal) >> >> ## Summary >> This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. >> >> ## Goals >> * Provide an easy and intuitive API to describe value holders that can change at most once. >> * Decouple declaration from initialization without significant footprint or performance penalties. >> * Reduce the amount of static initializer and/or field initialization code. >> * Uphold integrity and consistency, even in a multi-threaded environment. >> >> For more details, see the draft JEP: https://openjdk.org/jeps/8312611 >> >> ## Performance >> Performance compared to instance variables using an `AtomicReference` and one protected by double-checked locking under concurrent access by 8 threads: >> >> >> Benchmark Mode Cnt Score Error Units >> StableBenchmark.instanceAtomic avgt 10 1.576 ? 0.052 ns/op >> StableBenchmark.instanceDCL avgt 10 1.608 ? 0.059 ns/op >> StableBenchmark.instanceStable avgt 10 0.979 ? 0.023 ns/op <- StableValue (~40% faster than DCL) >> >> >> Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (8 threads): >> >> >> Benchmark Mode Cnt Score Error Units >> StableBenchmark.staticAtomic avgt 10 1.335 ? 0.056 ns/op >> StableBenchmark.staticCHI avgt 10 0.623 ? 0.086 ns/op >> StableBenchmark.staticDCL avgt 10 1.418 ? 0.171 ns/op >> StableBenchmark.staticList avgt 10 0.617 ? 0.024 ns/op >> StableBenchmark.staticStable avgt 10 0.604 ? 0.022 ns/op <- StableValue ( > 2x faster than `AtomicInteger` and DCL) >> >> >> Performance for stable lists in both instance and static contexts whereby the sum of random contents is calculated for stable lists (which are thread-safe) compared to `ArrayList` instances (which are not thread-safe) (under single thread access): >> >> >> Benchmark Mode Cnt Score Error Units >> StableListSumBenchmark.instanceArrayList avgt 10 0.356 ? 0.005 ns/op >> StableListSumBenchmark.instanceList avgt 10 0.373 ? 0.017 ns/op <- Stable list >> StableListSumBenchmark.staticArrayList avgt 10 0.352 ? ... > > src/java.base/share/classes/jdk/internal/lang/StableValue.java line 130: > >> 128: * } else { >> 129: * V newValue = supplier.get(); >> 130: * stable.setOrThrow(newValue); > > If ::computeIfUnset allows racy sets, then it isn't equivalent to this code as ::setOrThrow will throw on a race, correct? Indeed, this if-else should be guarded by a synchronized block, except the lock is on the internal mutex which is not publicly exposed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1575101750 From pminborg at openjdk.org Tue May 14 14:14:25 2024 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 14 May 2024 14:14:25 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) In-Reply-To: <6Ut30fvUqvy-ld6OsJCsWj8T4ImaRq_bPJrmxSbkD6U=.394965a6-d328-4af3-a07b-8088acc7e88b@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> <6Ut30fvUqvy-ld6OsJCsWj8T4ImaRq_bPJrmxSbkD6U=.394965a6-d328-4af3-a07b-8088acc7e88b@github.com> Message-ID: On Mon, 22 Apr 2024 17:09:39 GMT, Chen Liang wrote: >> src/java.base/share/classes/jdk/internal/lang/StableValue.java line 130: >> >>> 128: * } else { >>> 129: * V newValue = supplier.get(); >>> 130: * stable.setOrThrow(newValue); >> >> If ::computeIfUnset allows racy sets, then it isn't equivalent to this code as ::setOrThrow will throw on a race, correct? > > Indeed, this if-else should be guarded by a synchronized block, except the lock is on the internal mutex which is not publicly exposed. `computeIfUnset()` is indeed guarded by a synchronized block, only it sits on the method declaration of `computeIfUnsetVolatile0`. I think we should have an internal mutex. This will also correspond to the stable collections which have internal mutexes for each index/key. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1575972396 From mcimadamore at openjdk.org Tue May 14 14:14:25 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 14 May 2024 14:14:25 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> <_mwOb-4riQczNG6UoDy6S4ti7I68dYvcLAY3dzLNZaQ=.71a67ede-0f64-452e-bfec-4db95b25338b@github.com> Message-ID: <5GDZ3F_c8E4afPEjL9CFdgt9ms5zndr5lTq17yGdFkY=.6cddeb24-3a10-493e-b8d7-9854c0e08f39@github.com> On Wed, 17 Apr 2024 11:12:37 GMT, Per Minborg wrote: >> Yes, consider the 3 capture scenarios: >> | API | Capture frequency | Capture Impact | Code Convenience | Flexibility | >> |-----|-------------------|----------------|------------------|-------------| >> | `StableValue.ofMap(map, k -> ...)` | By accident | single capture is reused | OK | One generator for all keys | >> | `StableValue.computeIfUnset(map, key, k -> ...)` | By accident | capture happens for all access sites | somewhat ugly | Different generator for different keys | >> | `map.get(k).computeIfUnset(() -> ...)` | Always | capture happens for all access sites | OK | Different generator for different keys | >> >> Notice the `ofMap` factory is the most tolerant to faulty captures: even if it captures, the single capturing lambda is reused for all map stables, avoiding capture overheads at access sites. Given Java compiler doesn't tell user anything about captures during compilation, I think `ofMap` is a better factory to avoid accidentally writing capturing lambdas. > > I see what you mean. Initially, I thought it would be easy to create memorized functions but it turned out, that was not the case if one wants to retain easy debugability etc. So, I have added a couple of factory methods including this: > > > /** > * {@return a new memoized {@linkplain Function } backed by an internal > * stable map with the provided {@code inputs} keys where the provided > * {@code original} Function will only be invoked at most once per distinct input} > * > * @param original the original Function to convert to a memoized Function > * @param inputs the potential input values to the Function > * @param the type of input values > * @param the return type of the function > */ > static Function ofFunction(Set inputs, > Function original) { > Objects.requireNonNull(inputs); > Objects.requireNonNull(original); > ... > } I agree that these method appear to be confusing. We have: StableValue::of() StableValue::ofList(int) StableValue::ofMap(Set) These methods are clearly primitives, because they are used to create a wrapper around a stable value/array. (Actually, if you squint, the primitive is really the `ofMap` factory, since that one can be used to derive the other two as well, but that's mostly a sophism). Everything else falls in the "helper" bucket. That is, we could have: StableValue::ofList(IntFunction) -> List // similar to StableValue::ofList(int) StableValue::ofMap(Function) -> Map // similar to StableValue::ofMap(Set) Or, we could have: StableValue::ofSupplier(Supplier) -> Supplier // similar to StableValue::of() StableValue::ofIntFunction(IntFunction) -> IntFunction // similar to StableValue::ofList(int) StableValue::ofFunction(Function) -> Function // similar to StableValue::ofMap(Set) IMHO, having both sets feel slightly redundant. That is, if you have a Map, you also have a function from K, V - namely, map::get. And, conversely, if a client wants a List of fixed size, which is populated lazily, using a memoized IntFunction is, effectively, the same thing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1568840269 From liach at openjdk.org Tue May 14 14:14:25 2024 From: liach at openjdk.org (Chen Liang) Date: Tue, 14 May 2024 14:14:25 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) In-Reply-To: <5GDZ3F_c8E4afPEjL9CFdgt9ms5zndr5lTq17yGdFkY=.6cddeb24-3a10-493e-b8d7-9854c0e08f39@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> <_mwOb-4riQczNG6UoDy6S4ti7I68dYvcLAY3dzLNZaQ=.71a67ede-0f64-452e-bfec-4db95b25338b@github.com> <5GDZ3F_c8E4afPEjL9CFdgt9ms5zndr5lTq17yGdFkY=.6cddeb24-3a10-493e-b8d7-9854c0e08f39@github.com> Message-ID: On Wed, 17 Apr 2024 13:23:53 GMT, Maurizio Cimadamore wrote: >> I see what you mean. Initially, I thought it would be easy to create memorized functions but it turned out, that was not the case if one wants to retain easy debugability etc. So, I have added a couple of factory methods including this: >> >> >> /** >> * {@return a new memoized {@linkplain Function } backed by an internal >> * stable map with the provided {@code inputs} keys where the provided >> * {@code original} Function will only be invoked at most once per distinct input} >> * >> * @param original the original Function to convert to a memoized Function >> * @param inputs the potential input values to the Function >> * @param the type of input values >> * @param the return type of the function >> */ >> static Function ofFunction(Set inputs, >> Function original) { >> Objects.requireNonNull(inputs); >> Objects.requireNonNull(original); >> ... >> } > > I agree that these method appear to be confusing. We have: > > > StableValue::of() > StableValue::ofList(int) > StableValue::ofMap(Set) > > > These methods are clearly primitives, because they are used to create a wrapper around a stable value/array. (Actually, if you squint, the primitive is really the `ofMap` factory, since that one can be used to derive the other two as well, but that's mostly a sophism). > > Everything else falls in the "helper" bucket. That is, we could have: > > > StableValue::ofList(IntFunction) -> List // similar to StableValue::ofList(int) > StableValue::ofMap(Function) -> Map // similar to StableValue::ofMap(Set) > > > Or, we could have: > > > StableValue::ofSupplier(Supplier) -> Supplier // similar to StableValue::of() > StableValue::ofIntFunction(IntFunction) -> IntFunction // similar to StableValue::ofList(int) > StableValue::ofFunction(Function) -> Function // similar to StableValue::ofMap(Set) > > > IMHO, having both sets feel slightly redundant. That is, if you have a Map, you also have a function from K, V - namely, map::get. And, conversely, if a client wants a List of fixed size, which is populated lazily, using a memoized IntFunction is, effectively, the same thing. I prefer these: StableValue::ofSupplier(Supplier) -> StableValue StableValue::ofIntFunction(keys, IntFunction) -> IntFunction> StableValue::ofFunction(keys, Function) -> Function> These still expose StableValue so users can set the values if they need. In addition, the List/Map functionalites are mostly useless so a getter/function suffices for the most part. These APIs are less error-prone to accidental context capture compared to the individual use-site ones, as use-site leaks means each access involves an allocation, but the allocation for construction site is shared. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1568895552 From pminborg at openjdk.org Tue May 14 14:14:25 2024 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 14 May 2024 14:14:25 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Fri, 19 Apr 2024 09:32:56 GMT, ExE Boss wrote: >> # Stable Values & Collections (Internal) >> >> ## Summary >> This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. >> >> ## Goals >> * Provide an easy and intuitive API to describe value holders that can change at most once. >> * Decouple declaration from initialization without significant footprint or performance penalties. >> * Reduce the amount of static initializer and/or field initialization code. >> * Uphold integrity and consistency, even in a multi-threaded environment. >> >> For more details, see the draft JEP: https://openjdk.org/jeps/8312611 >> >> ## Performance >> Performance compared to instance variables using an `AtomicReference` and one protected by double-checked locking under concurrent access by 8 threads: >> >> >> Benchmark Mode Cnt Score Error Units >> StableBenchmark.instanceAtomic avgt 10 1.576 ? 0.052 ns/op >> StableBenchmark.instanceDCL avgt 10 1.608 ? 0.059 ns/op >> StableBenchmark.instanceStable avgt 10 0.979 ? 0.023 ns/op <- StableValue (~40% faster than DCL) >> >> >> Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (8 threads): >> >> >> Benchmark Mode Cnt Score Error Units >> StableBenchmark.staticAtomic avgt 10 1.335 ? 0.056 ns/op >> StableBenchmark.staticCHI avgt 10 0.623 ? 0.086 ns/op >> StableBenchmark.staticDCL avgt 10 1.418 ? 0.171 ns/op >> StableBenchmark.staticList avgt 10 0.617 ? 0.024 ns/op >> StableBenchmark.staticStable avgt 10 0.604 ? 0.022 ns/op <- StableValue ( > 2x faster than `AtomicInteger` and DCL) >> >> >> Performance for stable lists in both instance and static contexts whereby the sum of random contents is calculated for stable lists (which are thread-safe) compared to `ArrayList` instances (which are not thread-safe) (under single thread access): >> >> >> Benchmark Mode Cnt Score Error Units >> StableListSumBenchmark.instanceArrayList avgt 10 0.356 ? 0.005 ns/op >> StableListSumBenchmark.instanceList avgt 10 0.373 ? 0.017 ns/op <- Stable list >> StableListSumBenchmark.staticArrayList avgt 10 0.352 ? ... > > src/java.base/share/classes/jdk/internal/lang/stable/StableValueElement.java line 116: > >> 114: public V computeIfUnset(Supplier supplier) { >> 115: // Todo: This creates a lambda >> 116: return computeIfUnsetShared(supplier, Supplier::get); > > Suggestion: > > return computeIfUnsetShared(supplier, supplierExtractor()); Yes. This is a work in progress. I will explore using a `BiFunction` instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1574446126 From pminborg at openjdk.org Tue May 14 14:14:25 2024 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 14 May 2024 14:14:25 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Mon, 22 Apr 2024 09:34:39 GMT, Per Minborg wrote: >> src/java.base/share/classes/jdk/internal/lang/stable/StableValueElement.java line 116: >> >>> 114: public V computeIfUnset(Supplier supplier) { >>> 115: // Todo: This creates a lambda >>> 116: return computeIfUnsetShared(supplier, Supplier::get); >> >> Suggestion: >> >> return computeIfUnsetShared(supplier, supplierExtractor()); > > Yes. This is a work in progress. I will explore using a `BiFunction` instead. I've converted to pattern matching instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1576155645 From alanb at openjdk.org Tue May 14 14:22:07 2024 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 14 May 2024 14:22:07 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) In-Reply-To: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Tue, 16 Apr 2024 11:47:23 GMT, Per Minborg wrote: > # Stable Values & Collections (Internal) > > ## Summary > This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. > > ## Goals > * Provide an easy and intuitive API to describe value holders that can change at most once. > * Decouple declaration from initialization without significant footprint or performance penalties. > * Reduce the amount of static initializer and/or field initialization code. > * Uphold integrity and consistency, even in a multi-threaded environment. > > For more details, see the draft JEP: https://openjdk.org/jeps/8312611 > > ## Performance > Performance compared to instance variables using an `AtomicReference` and one protected by double-checked locking under concurrent access by 8 threads: > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.instanceAtomic avgt 10 1.576 ? 0.052 ns/op > StableBenchmark.instanceDCL avgt 10 1.608 ? 0.059 ns/op > StableBenchmark.instanceStable avgt 10 0.979 ? 0.023 ns/op <- StableValue (~40% faster than DCL) > > > Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (8 threads): > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.staticAtomic avgt 10 1.335 ? 0.056 ns/op > StableBenchmark.staticCHI avgt 10 0.623 ? 0.086 ns/op > StableBenchmark.staticDCL avgt 10 1.418 ? 0.171 ns/op > StableBenchmark.staticList avgt 10 0.617 ? 0.024 ns/op > StableBenchmark.staticStable avgt 10 0.604 ? 0.022 ns/op <- StableValue ( > 2x faster than `AtomicInteger` and DCL) > > > Performance for stable lists in both instance and static contexts whereby the sum of random contents is calculated for stable lists (which are thread-safe) compared to `ArrayList` instances (which are not thread-safe) (under single thread access): > > > Benchmark Mode Cnt Score Error Units > StableListSumBenchmark.instanceArrayList avgt 10 0.356 ? 0.005 ns/op > StableListSumBenchmark.instanceList avgt 10 0.373 ? 0.017 ns/op <- Stable list > StableListSumBenchmark.staticArrayList avgt 10 0.352 ? 0.002 ns/op > StableListSumBenchmark.staticList avgt 10 0.356 ? 0.00... src/java.base/share/classes/java/lang/reflect/AccessibleObject.java line 195: > 193: *
  • final fields of a type that implements > 194: * {@linkplain jdk.internal.lang.stable.TrustedFieldType} > 195: * (e.g {@linkplain StableValue StableValue})
  • The API docs for a standard method can't reference a JDK internal annotation. It would be possible for the API docs to admit to an implementation specific way to do this but I think we should try to avoid this for now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1600132293 From pminborg at openjdk.org Tue May 14 14:51:20 2024 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 14 May 2024 14:51:20 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v2] In-Reply-To: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: > # Stable Values & Collections (Internal) > > ## Summary > This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. > > ## Goals > * Provide an easy and intuitive API to describe value holders that can change at most once. > * Decouple declaration from initialization without significant footprint or performance penalties. > * Reduce the amount of static initializer and/or field initialization code. > * Uphold integrity and consistency, even in a multi-threaded environment. > > For more details, see the draft JEP: https://openjdk.org/jeps/8312611 > > ## Performance > Performance compared to instance variables using an `AtomicReference` and one protected by double-checked locking under concurrent access by 8 threads: > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.instanceAtomic avgt 10 1.576 ? 0.052 ns/op > StableBenchmark.instanceDCL avgt 10 1.608 ? 0.059 ns/op > StableBenchmark.instanceStable avgt 10 0.979 ? 0.023 ns/op <- StableValue (~40% faster than DCL) > > > Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (8 threads): > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.staticAtomic avgt 10 1.335 ? 0.056 ns/op > StableBenchmark.staticCHI avgt 10 0.623 ? 0.086 ns/op > StableBenchmark.staticDCL avgt 10 1.418 ? 0.171 ns/op > StableBenchmark.staticList avgt 10 0.617 ? 0.024 ns/op > StableBenchmark.staticStable avgt 10 0.604 ? 0.022 ns/op <- StableValue ( > 2x faster than `AtomicInteger` and DCL) > > > Performance for stable lists in both instance and static contexts whereby the sum of random contents is calculated for stable lists (which are thread-safe) compared to `ArrayList` instances (which are not thread-safe) (under single thread access): > > > Benchmark Mode Cnt Score Error Units > StableListSumBenchmark.instanceArrayList avgt 10 0.356 ? 0.005 ns/op > StableListSumBenchmark.instanceList avgt 10 0.373 ? 0.017 ns/op <- Stable list > StableListSumBenchmark.staticArrayList avgt 10 0.352 ? 0.002 ns/op > StableListSumBenchmark.staticList avgt 10 0.356 ? 0.00... Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Remove text in public class that references an internal class ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18794/files - new: https://git.openjdk.org/jdk/pull/18794/files/5d5dcced..d7c31585 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18794.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18794/head:pull/18794 PR: https://git.openjdk.org/jdk/pull/18794 From pminborg at openjdk.org Tue May 14 14:51:20 2024 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 14 May 2024 14:51:20 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v2] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Tue, 14 May 2024 14:19:44 GMT, Alan Bateman wrote: >> Per Minborg has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove text in public class that references an internal class > > src/java.base/share/classes/java/lang/reflect/AccessibleObject.java line 195: > >> 193: *
  • final fields of a type that implements >> 194: * {@linkplain jdk.internal.lang.stable.TrustedFieldType} >> 195: * (e.g {@linkplain StableValue StableValue})
  • > > The API docs for a standard method can't reference a JDK internal annotation. It would be possible for the API docs to admit to an implementation specific way to do this but I think we should try to avoid this for now. I will remove it. Once some of the classes are public, we could add back a message of this type. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1600179776 From kvn at openjdk.org Tue May 14 15:13:02 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 14 May 2024 15:13:02 GMT Subject: RFR: JDK-8330795 : C2: assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type with -XX:-UseCompressedClassPointers [v3] In-Reply-To: <3KQPqbFAyVDkPx28d8DN8Y1_zrJ6LwX6eOEOqxe8mvs=.4ec47e90-e516-4960-96c7-8f0cdbc8b29b@github.com> References: <3KQPqbFAyVDkPx28d8DN8Y1_zrJ6LwX6eOEOqxe8mvs=.4ec47e90-e516-4960-96c7-8f0cdbc8b29b@github.com> Message-ID: On Mon, 13 May 2024 22:09:23 GMT, Cesar Soares Lucas wrote: >> The `assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type` failure was caused by the fact that we didn't have a "zero value" for the type T_METADATA. The RAM patch uses that data when it creates a Phi node merging Klass loads and UseCompressedClassPointers is disabled. >> >> Tested with JTREG tier1-4 on Linux x86_64 & ARM64. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Addressing feedback: more tests. Reverting previous change. My testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19148#pullrequestreview-2055708032 From cslucas at openjdk.org Tue May 14 15:26:03 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 14 May 2024 15:26:03 GMT Subject: RFR: JDK-8330795 : C2: assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type with -XX:-UseCompressedClassPointers [v3] In-Reply-To: References: <3KQPqbFAyVDkPx28d8DN8Y1_zrJ6LwX6eOEOqxe8mvs=.4ec47e90-e516-4960-96c7-8f0cdbc8b29b@github.com> Message-ID: On Tue, 14 May 2024 03:52:00 GMT, Vladimir Kozlov wrote: >> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressing feedback: more tests. Reverting previous change. > > Thank you for explaining issue you have with klass loading. > I will run our testing with you current version. Thank you for reviewing and testing @vnkozlov ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19148#issuecomment-2110521639 From vklang at openjdk.org Tue May 14 16:06:09 2024 From: vklang at openjdk.org (Viktor Klang) Date: Tue, 14 May 2024 16:06:09 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v2] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: <1BnXIoSgu8PhvzHlCE5aaxAUtjBrayd935yQSnqZZbc=.18384687-9678-4b84-9cff-95c51f51d528@github.com> On Tue, 14 May 2024 14:51:20 GMT, Per Minborg wrote: >> # Stable Values & Collections (Internal) >> >> ## Summary >> This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. >> >> ## Goals >> * Provide an easy and intuitive API to describe value holders that can change at most once. >> * Decouple declaration from initialization without significant footprint or performance penalties. >> * Reduce the amount of static initializer and/or field initialization code. >> * Uphold integrity and consistency, even in a multi-threaded environment. >> >> For more details, see the draft JEP: https://openjdk.org/jeps/8312611 >> >> ## Performance >> Performance compared to instance variables using an `AtomicReference` and one protected by double-checked locking under concurrent access by 8 threads: >> >> >> Benchmark Mode Cnt Score Error Units >> StableBenchmark.instanceAtomic avgt 10 1.576 ? 0.052 ns/op >> StableBenchmark.instanceDCL avgt 10 1.608 ? 0.059 ns/op >> StableBenchmark.instanceStable avgt 10 0.979 ? 0.023 ns/op <- StableValue (~40% faster than DCL) >> >> >> Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (8 threads): >> >> >> Benchmark Mode Cnt Score Error Units >> StableBenchmark.staticAtomic avgt 10 1.335 ? 0.056 ns/op >> StableBenchmark.staticCHI avgt 10 0.623 ? 0.086 ns/op >> StableBenchmark.staticDCL avgt 10 1.418 ? 0.171 ns/op >> StableBenchmark.staticList avgt 10 0.617 ? 0.024 ns/op >> StableBenchmark.staticStable avgt 10 0.604 ? 0.022 ns/op <- StableValue ( > 2x faster than `AtomicInteger` and DCL) >> >> >> Performance for stable lists in both instance and static contexts whereby the sum of random contents is calculated for stable lists (which are thread-safe) compared to `ArrayList` instances (which are not thread-safe) (under single thread access): >> >> >> Benchmark Mode Cnt Score Error Units >> StableListSumBenchmark.instanceArrayList avgt 10 0.356 ? 0.005 ns/op >> StableListSumBenchmark.instanceList avgt 10 0.373 ? 0.017 ns/op <- Stable list >> StableListSumBenchmark.staticArrayList avgt 10 0.352 ? ... > > Per Minborg has updated the pull request incrementally with one additional commit since the last revision: > > Remove text in public class that references an internal class src/java.base/share/classes/jdk/internal/lang/StableValue.java line 171: > 169: /** > 170: * {@return a fresh stable value with an unset value where the returned stable's > 171: * value is computed in a separate background thread (created via the provided Suggestion: * {@return a fresh stable value with an unset value where the returned stable * value is computed in a separate background thread (created via the provided src/java.base/share/classes/jdk/internal/lang/StableValue.java line 175: > 173: *

    > 174: * If the supplier throws an (unchecked) exception, the exception is ignored, and no > 175: * value is set. Is it likely that users will want to be made aware of failures? If so, perhaps it would make sense to make sure that the Exception hits the UncaughtExceptionHandler? ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1600296959 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1600298211 From epeter at openjdk.org Tue May 14 16:15:02 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 May 2024 16:15:02 GMT Subject: RFR: 8331311: C2: Big Endian Port of 8318446: optimize stores into primitive arrays by combining values into larger store In-Reply-To: References: Message-ID: <6iq6RfQd4oxCBsAtVMgonlmckmiZU-PYgbQJccrnHXE=.253eb5c2-1009-4a8d-816e-aaa6a92b3ed2@github.com> On Mon, 13 May 2024 15:58:31 GMT, Richard Reingruber wrote: >> This pr adds a few tweaks to [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446) which allows enabling it also on big endian platforms (e.g. AIX, S390). JDK-8318446 introduced a C2 optimization to replace consecutive stores to a primitive array with just one store. >> >> By example (from `TestMergeStores.java`): >> >> >> static Object[] test2a(byte[] a, int offset, long v) { >> if (IS_BIG_ENDIAN) { >> a[offset + 0] = (byte)(v >> 56); >> a[offset + 1] = (byte)(v >> 48); >> a[offset + 2] = (byte)(v >> 40); >> a[offset + 3] = (byte)(v >> 32); >> a[offset + 4] = (byte)(v >> 24); >> a[offset + 5] = (byte)(v >> 16); >> a[offset + 6] = (byte)(v >> 8); >> a[offset + 7] = (byte)(v >> 0); >> } else { >> a[offset + 0] = (byte)(v >> 0); >> a[offset + 1] = (byte)(v >> 8); >> a[offset + 2] = (byte)(v >> 16); >> a[offset + 3] = (byte)(v >> 24); >> a[offset + 4] = (byte)(v >> 32); >> a[offset + 5] = (byte)(v >> 40); >> a[offset + 6] = (byte)(v >> 48); >> a[offset + 7] = (byte)(v >> 56); >> } >> return new Object[]{ a }; >> } >> >> >> Depending on the endianess 8 bytes are stored into an array. The order of the stores is the same as the order of an 8-byte-store therefore 8 1-byte-stores can be replaced with just one 8-byte-store (if there aren't too many range checks). >> >> Additionally I've fixed a few comments and a test bug. >> >> The optimization seems to be a little bit more effective on big endian platforms. >> >> Again by example: >> >> >> static Object[] test800a(byte[] a, int offset, long v) { >> if (IS_BIG_ENDIAN) { >> a[offset + 0] = (byte)(v >> 40); // Removed from candidate list >> a[offset + 1] = (byte)(v >> 32); // Removed from candidate list >> a[offset + 2] = (byte)(v >> 24); // Merged >> a[offset + 3] = (byte)(v >> 16); // Merged >> a[offset + 4] = (byte)(v >> 8); // Merged >> a[offset + 5] = (byte)(v >> 0); // Merged >> } else { >> a[offset + 0] = (byte)(v >> 0); // Removed from candidate list >> a[offset + 1] = (byte)(v >> 8); // Removed from candidate list >> a[offset + 2] = (byte)(v >> 16); // Not merged >> a[offset + 3] = (byte)(v >> 24); // Not merged >> a[offset + 4] = (byte)(v >> 32); // Not merge... > > @offamitkumar you can put this through your testing if you like. It should solve the issues with test/hotspot/jtreg/compiler/c2/TestMergeStores.java also for s390. @reinrich thanks for taking this up! Just did a quick scan of the tests. I think it could be good to have both big/small endian tests run on both big/small endian machines, but only expect IR rules to pass if the test and platform are expected to optimize. This just makes sure that the logic is correct, and does not optimize the wrong cases, producing wrong results. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19218#issuecomment-2110622758 From vklang at openjdk.org Tue May 14 16:18:09 2024 From: vklang at openjdk.org (Viktor Klang) Date: Tue, 14 May 2024 16:18:09 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v2] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Tue, 14 May 2024 14:51:20 GMT, Per Minborg wrote: >> # Stable Values & Collections (Internal) >> >> ## Summary >> This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. >> >> ## Goals >> * Provide an easy and intuitive API to describe value holders that can change at most once. >> * Decouple declaration from initialization without significant footprint or performance penalties. >> * Reduce the amount of static initializer and/or field initialization code. >> * Uphold integrity and consistency, even in a multi-threaded environment. >> >> For more details, see the draft JEP: https://openjdk.org/jeps/8312611 >> >> ## Performance >> Performance compared to instance variables using an `AtomicReference` and one protected by double-checked locking under concurrent access by 8 threads: >> >> >> Benchmark Mode Cnt Score Error Units >> StableBenchmark.instanceAtomic avgt 10 1.576 ? 0.052 ns/op >> StableBenchmark.instanceDCL avgt 10 1.608 ? 0.059 ns/op >> StableBenchmark.instanceStable avgt 10 0.979 ? 0.023 ns/op <- StableValue (~40% faster than DCL) >> >> >> Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (8 threads): >> >> >> Benchmark Mode Cnt Score Error Units >> StableBenchmark.staticAtomic avgt 10 1.335 ? 0.056 ns/op >> StableBenchmark.staticCHI avgt 10 0.623 ? 0.086 ns/op >> StableBenchmark.staticDCL avgt 10 1.418 ? 0.171 ns/op >> StableBenchmark.staticList avgt 10 0.617 ? 0.024 ns/op >> StableBenchmark.staticStable avgt 10 0.604 ? 0.022 ns/op <- StableValue ( > 2x faster than `AtomicInteger` and DCL) >> >> >> Performance for stable lists in both instance and static contexts whereby the sum of random contents is calculated for stable lists (which are thread-safe) compared to `ArrayList` instances (which are not thread-safe) (under single thread access): >> >> >> Benchmark Mode Cnt Score Error Units >> StableListSumBenchmark.instanceArrayList avgt 10 0.356 ? 0.005 ns/op >> StableListSumBenchmark.instanceList avgt 10 0.373 ? 0.017 ns/op <- Stable list >> StableListSumBenchmark.staticArrayList avgt 10 0.352 ? ... > > Per Minborg has updated the pull request incrementally with one additional commit since the last revision: > > Remove text in public class that references an internal class src/java.base/share/classes/jdk/internal/lang/stable/StableArray3DImpl.java line 33: > 31: Objects.checkIndex(i1, dim1); > 32: Objects.checkIndex(i2, dim2); > 33: final int index = i0 * dim1 * dim2 + i1 * dim2 + i2; Might be worth doing some overflow checking here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1600313603 From epeter at openjdk.org Tue May 14 16:19:06 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 May 2024 16:19:06 GMT Subject: RFR: 8331311: C2: Big Endian Port of 8318446: optimize stores into primitive arrays by combining values into larger store In-Reply-To: References: Message-ID: <5Zx0TXxEyR05tg6kNhs_3tHKsTm9xFGYmawen_m4fb4=.991d9155-91ac-4534-b655-655adc5e2b51@github.com> On Mon, 13 May 2024 15:53:52 GMT, Richard Reingruber wrote: > This pr adds a few tweaks to [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446) which allows enabling it also on big endian platforms (e.g. AIX, S390). JDK-8318446 introduced a C2 optimization to replace consecutive stores to a primitive array with just one store. > > By example (from `TestMergeStores.java`): > > > static Object[] test2a(byte[] a, int offset, long v) { > if (IS_BIG_ENDIAN) { > a[offset + 0] = (byte)(v >> 56); > a[offset + 1] = (byte)(v >> 48); > a[offset + 2] = (byte)(v >> 40); > a[offset + 3] = (byte)(v >> 32); > a[offset + 4] = (byte)(v >> 24); > a[offset + 5] = (byte)(v >> 16); > a[offset + 6] = (byte)(v >> 8); > a[offset + 7] = (byte)(v >> 0); > } else { > a[offset + 0] = (byte)(v >> 0); > a[offset + 1] = (byte)(v >> 8); > a[offset + 2] = (byte)(v >> 16); > a[offset + 3] = (byte)(v >> 24); > a[offset + 4] = (byte)(v >> 32); > a[offset + 5] = (byte)(v >> 40); > a[offset + 6] = (byte)(v >> 48); > a[offset + 7] = (byte)(v >> 56); > } > return new Object[]{ a }; > } > > > Depending on the endianess 8 bytes are stored into an array. The order of the stores is the same as the order of an 8-byte-store therefore 8 1-byte-stores can be replaced with just one 8-byte-store (if there aren't too many range checks). > > Additionally I've fixed a few comments and a test bug. > > The optimization seems to be a little bit more effective on big endian platforms. > > Again by example: > > > static Object[] test800a(byte[] a, int offset, long v) { > if (IS_BIG_ENDIAN) { > a[offset + 0] = (byte)(v >> 40); // Removed from candidate list > a[offset + 1] = (byte)(v >> 32); // Removed from candidate list > a[offset + 2] = (byte)(v >> 24); // Merged > a[offset + 3] = (byte)(v >> 16); // Merged > a[offset + 4] = (byte)(v >> 8); // Merged > a[offset + 5] = (byte)(v >> 0); // Merged > } else { > a[offset + 0] = (byte)(v >> 0); // Removed from candidate list > a[offset + 1] = (byte)(v >> 8); // Removed from candidate list > a[offset + 2] = (byte)(v >> 16); // Not merged > a[offset + 3] = (byte)(v >> 24); // Not merged > a[offset + 4] = (byte)(v >> 32); // Not merged > a[offset + 5] = (byte)(v >> 40); // Not merged > } > return new Object[]{ a };... src/hotspot/share/opto/memnode.cpp line 3313: > 3311: merged_input_value = _store->in(MemNode::ValueIn); > 3312: bool is_true = is_con_RShift(first->in(MemNode::ValueIn), base_last, shift_last); > 3313: #endif // VM_LITTLE_ENDIAN You could just have local variables for "lo" / "hi", set them depending on big/little endian, and then the logic would be the same for both. test/hotspot/jtreg/compiler/c2/TestMergeStores.java line 57: > 55: private static final Random RANDOM = Utils.getRandomInstance(); > 56: > 57: private static final boolean IS_BIG_ENDIAN = UNSAFE.isBigEndian(); `static` is very important here, so that the `if` constant fold in the test. Otherwise we don't know if we have the IR rule pass because of the correct branch. Maybe add a comment for that. test/hotspot/jtreg/compiler/c2/TestMergeStores.java line 117: > 115: testGroups.get("test2").put("test2c", (_,_) -> { return test2c(aB.clone(), offset1, vL1); }); > 116: testGroups.get("test2").put("test2d", (_,_) -> { return test2d(aB.clone(), offset1, vL1); }); > 117: testGroups.get("test2").put("test2e", (_,_) -> { return test2e(aB.clone(), offset1, vL1); }); Nice catch ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19218#discussion_r1600314029 PR Review Comment: https://git.openjdk.org/jdk/pull/19218#discussion_r1600315534 PR Review Comment: https://git.openjdk.org/jdk/pull/19218#discussion_r1600315727 From roland at openjdk.org Tue May 14 16:23:13 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 14 May 2024 16:23:13 GMT Subject: RFR: 8332245: C2: missing record_for_ign() call in GraphKit::must_be_not_null() Message-ID: The `If` node that's created by `GraphKit::must_be_not_null()` is not enqueued for igvn when it's created. The test case shows it prevents 2 identical tests from commoning when the first igvn executes. This is a minor issue I noticed whule working on something else. ------------- Commit messages: - test and fix Changes: https://git.openjdk.org/jdk/pull/19233/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19233&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332245 Stats: 75 lines in 2 files changed: 75 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19233.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19233/head:pull/19233 PR: https://git.openjdk.org/jdk/pull/19233 From roland at openjdk.org Tue May 14 16:28:17 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 14 May 2024 16:28:17 GMT Subject: RFR: 8332245: C2: missing record_for_ign() call in GraphKit::must_be_not_null() [v2] In-Reply-To: References: Message-ID: > The `If` node that's created by `GraphKit::must_be_not_null()` is not > enqueued for igvn when it's created. The test case shows it prevents 2 > identical tests from commoning when the first igvn executes. This is a > minor issue I noticed whule working on something else. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: fixed test @summary ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19233/files - new: https://git.openjdk.org/jdk/pull/19233/files/23341ae6..48fb906c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19233&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19233&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19233.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19233/head:pull/19233 PR: https://git.openjdk.org/jdk/pull/19233 From thartmann at openjdk.org Tue May 14 20:25:03 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 14 May 2024 20:25:03 GMT Subject: RFR: 8332245: C2: missing record_for_ign() call in GraphKit::must_be_not_null() [v2] In-Reply-To: References: Message-ID: On Tue, 14 May 2024 16:28:17 GMT, Roland Westrelin wrote: >> The `If` node that's created by `GraphKit::must_be_not_null()` is not >> enqueued for igvn when it's created. The test case shows it prevents 2 >> identical tests from commoning when the first igvn executes. This is a >> minor issue I noticed whule working on something else. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > fixed test @summary Looks good to me otherwise. src/hotspot/share/opto/graphKit.cpp line 1471: > 1469: IfNode *iff = new IfNode(control(), opaq, PROB_MAX, COUNT_UNKNOWN); > 1470: _gvn.set_type(iff, iff->Value(&_gvn)); > 1471: if (!tst->is_Con()) record_for_igvn(iff); Suggestion: if (!tst->is_Con()) { record_for_igvn(iff); } ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19233#pullrequestreview-2056364490 PR Review Comment: https://git.openjdk.org/jdk/pull/19233#discussion_r1600599026 From thartmann at openjdk.org Tue May 14 20:45:04 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 14 May 2024 20:45:04 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v4] In-Reply-To: References: Message-ID: On Fri, 3 May 2024 10:11:25 GMT, Roland Westrelin wrote: >> Range check `CastII` nodes are removed once loop opts are over. The >> test case for this change includes 3 cases where elimination of a >> range check `CastII` causes a crash in compiled code because either a >> out of bounds array load or a division by zero happen. >> >> In `test1`: >> >> - the range checks for the `array[otherArray.length]` loads constant >> fold: `otherArray.length` is a `CastII` of i at the `otherArray` >> allocation. `i` is less than 9. The `CastII` at the allocation >> narrows the type down further to `[0-9]`. >> >> - the `array[otherArray.length]` loads are control dependent on the >> unrelated: >> >> >> if (flag == 0) { >> >> >> test. There's an identical dominating test which replaces that one. As >> a consequence, the `array[otherArray.length]` loads become control >> dependent on the dominating test. >> >> - The `CastII` nodes at the `otherArray` allocations are replaced by a >> dominating range check `CastII` nodes for: >> >> >> newArray[i] = 42; >> >> >> - After loop opts, the range check `CastII` nodes are removed and the >> 2 `array[otherArray.length]` loads common at the first: >> >> >> if (flag == 0) { >> >> >> test before the: >> >> >> float[] otherArray = new float[i]; >> >> >> and >> >> >> newArray[i] = 42; >> >> >> that guarantee `i` is positive. >> >> - `test1` is called with `i = -1`, the array load proceeds with an out >> of bounds index and the crash occurs. >> >> >> `test2` and `test3` are mostly identical except for the check that's >> eliminated (a null divisor check) and the instruction that causes a >> fault (an integer division). >> >> The fix I propose is to not eliminate range check `CastII` nodes after >> loop opts. When range check`CastII` nodes were introduced, performance >> was observed to regress. Removing them after loop opts was found to >> preserve both correctness and performance. Today, the performance >> regression still exists when `CastII` nodes are left in. So I propose >> we keep them until the end of optimizations (so the 2 array loads >> above don't lose a dependency and wrongly common) but remove them at >> the end of all optimizations. >> >> In the case of the array loads, they are dependent on a range check >> for another array through a range check `CastII` and we must not lose >> that dependency otherwise the array loads could float above the range >> check at gcm time. I propose we deal with that problem the way it's >> handled for `CastPP` nodes: add the dependency to the load (or >> division)nodes ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > test fix Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18377#pullrequestreview-2056422264 From thartmann at openjdk.org Tue May 14 20:45:05 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 14 May 2024 20:45:05 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: Message-ID: <05mqp07IWOcReVgYabEiDRcLueAMXs8Y8sb1u5SqKiA=.ac76547b-7dcc-4c9b-8db6-d251e2cc9625@github.com> On Mon, 6 May 2024 10:50:30 GMT, Roland Westrelin wrote: > What you're saying, I think, is that if we have, say, a CastII that's input to a DivI node, if the input to that cast is non zero, then we don't need to add the CastII control as dependency to the DivI Yes, that was my point. > That doesn't seem straightforward because this is done once we have no igvn instance to propagate types anymore. So, while I agree this is conservative, it still seems like the most reasonable fix. Right, we can still go down that path if it ever becomes necessary. > That seems like a different problem that out of the scope of this particular issue. Could you please file a follow-up bug for that? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18377#discussion_r1600617086 From thartmann at openjdk.org Tue May 14 20:45:04 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 14 May 2024 20:45:04 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v4] In-Reply-To: References: Message-ID: On Fri, 3 May 2024 10:13:17 GMT, Roland Westrelin wrote: > I did but was fairly conservative. In the case of PhaseIdealLoop::match_fill_loop, I don't think this change makes a difference: if we don't need the check for CastIINode::has_range_check there then it's true with or without that change. Right, maybe we can put that into the follow-up bug. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18377#issuecomment-2111104232 From dlong at openjdk.org Tue May 14 21:07:01 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 14 May 2024 21:07:01 GMT Subject: RFR: 8331311: C2: Big Endian Port of 8318446: optimize stores into primitive arrays by combining values into larger store In-Reply-To: References: Message-ID: On Mon, 13 May 2024 15:53:52 GMT, Richard Reingruber wrote: > This pr adds a few tweaks to [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446) which allows enabling it also on big endian platforms (e.g. AIX, S390). JDK-8318446 introduced a C2 optimization to replace consecutive stores to a primitive array with just one store. > > By example (from `TestMergeStores.java`): > > > static Object[] test2a(byte[] a, int offset, long v) { > if (IS_BIG_ENDIAN) { > a[offset + 0] = (byte)(v >> 56); > a[offset + 1] = (byte)(v >> 48); > a[offset + 2] = (byte)(v >> 40); > a[offset + 3] = (byte)(v >> 32); > a[offset + 4] = (byte)(v >> 24); > a[offset + 5] = (byte)(v >> 16); > a[offset + 6] = (byte)(v >> 8); > a[offset + 7] = (byte)(v >> 0); > } else { > a[offset + 0] = (byte)(v >> 0); > a[offset + 1] = (byte)(v >> 8); > a[offset + 2] = (byte)(v >> 16); > a[offset + 3] = (byte)(v >> 24); > a[offset + 4] = (byte)(v >> 32); > a[offset + 5] = (byte)(v >> 40); > a[offset + 6] = (byte)(v >> 48); > a[offset + 7] = (byte)(v >> 56); > } > return new Object[]{ a }; > } > > > Depending on the endianess 8 bytes are stored into an array. The order of the stores is the same as the order of an 8-byte-store therefore 8 1-byte-stores can be replaced with just one 8-byte-store (if there aren't too many range checks). > > Additionally I've fixed a few comments and a test bug. > > The optimization seems to be a little bit more effective on big endian platforms. > > Again by example: > > > static Object[] test800a(byte[] a, int offset, long v) { > if (IS_BIG_ENDIAN) { > a[offset + 0] = (byte)(v >> 40); // Removed from candidate list > a[offset + 1] = (byte)(v >> 32); // Removed from candidate list > a[offset + 2] = (byte)(v >> 24); // Merged > a[offset + 3] = (byte)(v >> 16); // Merged > a[offset + 4] = (byte)(v >> 8); // Merged > a[offset + 5] = (byte)(v >> 0); // Merged > } else { > a[offset + 0] = (byte)(v >> 0); // Removed from candidate list > a[offset + 1] = (byte)(v >> 8); // Removed from candidate list > a[offset + 2] = (byte)(v >> 16); // Not merged > a[offset + 3] = (byte)(v >> 24); // Not merged > a[offset + 4] = (byte)(v >> 32); // Not merged > a[offset + 5] = (byte)(v >> 40); // Not merged > } > return new Object[]{ a };... It's not obvious to me why something like a[offset + 2] = (byte)(v >> 16); // Not merged a[offset + 3] = (byte)(v >> 24); // Not merged a[offset + 4] = (byte)(v >> 32); // Not merged a[offset + 5] = (byte)(v >> 40); // Not merged can't be merged. Is it because you only use `v`as a possible 32-bit value? Why not use something like the following pseudo-code? int bytes2word(byte b1, byte b2, byte b3, byte b4) { return (b1 & 0xff) << 24 | (b2 & 0xff) << 16 | (b3 & 0xff) << 8 | (b4 & 0xff); } // Substituting in the values from the example: int big_endian = bytes2word((byte)(v >> 40), (byte)(v >> 32), (byte)(v >> 24), (byte)(v >> 16)); int little_endian = bytes2word((byte)(v >> 16), (byte)(v >> 24), (byte)(v >> 32), (byte)(v >> 40)); ------------- PR Comment: https://git.openjdk.org/jdk/pull/19218#issuecomment-2111139166 From dlong at openjdk.org Tue May 14 21:12:01 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 14 May 2024 21:12:01 GMT Subject: RFR: 8331311: C2: Big Endian Port of 8318446: optimize stores into primitive arrays by combining values into larger store In-Reply-To: References: Message-ID: On Mon, 13 May 2024 15:53:52 GMT, Richard Reingruber wrote: > This pr adds a few tweaks to [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446) which allows enabling it also on big endian platforms (e.g. AIX, S390). JDK-8318446 introduced a C2 optimization to replace consecutive stores to a primitive array with just one store. > > By example (from `TestMergeStores.java`): > > > static Object[] test2a(byte[] a, int offset, long v) { > if (IS_BIG_ENDIAN) { > a[offset + 0] = (byte)(v >> 56); > a[offset + 1] = (byte)(v >> 48); > a[offset + 2] = (byte)(v >> 40); > a[offset + 3] = (byte)(v >> 32); > a[offset + 4] = (byte)(v >> 24); > a[offset + 5] = (byte)(v >> 16); > a[offset + 6] = (byte)(v >> 8); > a[offset + 7] = (byte)(v >> 0); > } else { > a[offset + 0] = (byte)(v >> 0); > a[offset + 1] = (byte)(v >> 8); > a[offset + 2] = (byte)(v >> 16); > a[offset + 3] = (byte)(v >> 24); > a[offset + 4] = (byte)(v >> 32); > a[offset + 5] = (byte)(v >> 40); > a[offset + 6] = (byte)(v >> 48); > a[offset + 7] = (byte)(v >> 56); > } > return new Object[]{ a }; > } > > > Depending on the endianess 8 bytes are stored into an array. The order of the stores is the same as the order of an 8-byte-store therefore 8 1-byte-stores can be replaced with just one 8-byte-store (if there aren't too many range checks). > > Additionally I've fixed a few comments and a test bug. > > The optimization seems to be a little bit more effective on big endian platforms. > > Again by example: > > > static Object[] test800a(byte[] a, int offset, long v) { > if (IS_BIG_ENDIAN) { > a[offset + 0] = (byte)(v >> 40); // Removed from candidate list > a[offset + 1] = (byte)(v >> 32); // Removed from candidate list > a[offset + 2] = (byte)(v >> 24); // Merged > a[offset + 3] = (byte)(v >> 16); // Merged > a[offset + 4] = (byte)(v >> 8); // Merged > a[offset + 5] = (byte)(v >> 0); // Merged > } else { > a[offset + 0] = (byte)(v >> 0); // Removed from candidate list > a[offset + 1] = (byte)(v >> 8); // Removed from candidate list > a[offset + 2] = (byte)(v >> 16); // Not merged > a[offset + 3] = (byte)(v >> 24); // Not merged > a[offset + 4] = (byte)(v >> 32); // Not merged > a[offset + 5] = (byte)(v >> 40); // Not merged > } > return new Object[]{ a };... In other words, it seems like it could work for arbitrary byte values if the merged value was computed from those individual values. They wouldn't need to be shifted values. a[offset + 0] = (byte)0x1; a[offset + 1] = (byte)(0x2; a[offset + 2] = (byte)0x3; a[offset + 3] = (byte)(0x4; The example above would either write 0x01020304 or 0x04030201 depending on the endianness. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19218#issuecomment-2111147041 From dlong at openjdk.org Tue May 14 21:46:03 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 14 May 2024 21:46:03 GMT Subject: RFR: 8327661: C1: Make RBP allocatable on x64 when PreserveFramePointer is disabled [v3] In-Reply-To: References: Message-ID: On Wed, 13 Mar 2024 06:49:30 GMT, Denghui Dong wrote: >> Hi, >> >> Could I have a review of this change that makes RBP allocatable in c1 register allocation when PreserveFramePointer is not enabled. >> >> There seems no reason that RBP cannot be used. Although the performance of c1 jit code is not very critical, in my opinion, this change will not add overhead of compilation. So maybe it is acceptable. >> >> I am not very sure if I have changed all the places that should be. >> >> Testing: fastdebug tier1-4 on Linux x64 > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > delete jmh Yes, the risk / reward ratio on this seems borderline. Even of arm32 or x86_32, where there are fewer registers, it's not clear how much using FP in C1 would help. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18167#issuecomment-2111188367 From sviswanathan at openjdk.org Tue May 14 23:54:10 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 14 May 2024 23:54:10 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Rearrange; add lambdas for clarity src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1589: > 1587: case 3: > 1588: case 4: > 1589: __ movl(needleVal, Address(needle, offsetOfFirstByteToCompare)); If the size of the needle is 7 and it is an LL case with NUMBER_OF_NEEDLE_BYTES_TO_COMPARE set as 3: bytesLeftToCompare = 4 (i.e. 7-3); offsetOfFirstByteToCompare = 2 (i.e. 3-1); the movl will be loading bytes 2,3,4,5 So we seem to be missing loading the last byte of the needle. Is that correct? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1735: > 1733: // generated with 32 - (n - k + 1) bits set that ensures matches past the end of the original > 1734: // haystack do not get considered during compares. > 1735: // Mask is generated below with (n-k+1) bits set and not 32- (n-k+1) bits set. Also it will be helpful if we specify what is n and k. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1784: > 1782: __ subq(tmp, haystack_len); > 1783: } > 1784: __ leaq(haystack, Address(rsp, tmp, Address::times_1)); This whole code is repeated in two places. Could be made into a function and used at both places. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1838: > 1836: __ shrq(rax, 1); > 1837: } > 1838: We need to be consistent either use tzcntl, shrl, testl or tzcntq, shrq, testq. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1600787103 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1600760538 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1600489229 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1600765277 From cslucas at openjdk.org Wed May 15 01:49:20 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 15 May 2024 01:49:20 GMT Subject: Integrated: JDK-8330795 : C2: assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type with -XX:-UseCompressedClassPointers In-Reply-To: References: Message-ID: On Wed, 8 May 2024 23:44:26 GMT, Cesar Soares Lucas wrote: > The `assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type` failure was caused by the fact that we didn't have a "zero value" for the type T_METADATA. The RAM patch uses that data when it creates a Phi node merging Klass loads and UseCompressedClassPointers is disabled. > > Tested with JTREG tier1-4 on Linux x86_64 & ARM64. This pull request has now been integrated. Changeset: 4e77cf88 Author: Cesar Soares Lucas Committer: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/4e77cf881d031e5b0320915b3eabd7702e560291 Stats: 139 lines in 3 files changed: 117 ins; 1 del; 21 mod 8330795: C2: assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type with -XX:-UseCompressedClassPointers Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/19148 From ddong at openjdk.org Wed May 15 02:04:14 2024 From: ddong at openjdk.org (Denghui Dong) Date: Wed, 15 May 2024 02:04:14 GMT Subject: Withdrawn: 8327661: C1: Make RBP allocatable on x64 when PreserveFramePointer is disabled In-Reply-To: References: Message-ID: On Fri, 8 Mar 2024 11:12:53 GMT, Denghui Dong wrote: > Hi, > > Could I have a review of this change that makes RBP allocatable in c1 register allocation when PreserveFramePointer is not enabled. > > There seems no reason that RBP cannot be used. Although the performance of c1 jit code is not very critical, in my opinion, this change will not add overhead of compilation. So maybe it is acceptable. > > I am not very sure if I have changed all the places that should be. > > Testing: fastdebug tier1-4 on Linux x64 This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/18167 From cslucas at openjdk.org Wed May 15 04:11:36 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 15 May 2024 04:11:36 GMT Subject: RFR: JDK-8330565 : C2: Multiple crashes with CTW after JDK-8316991 [v2] In-Reply-To: References: Message-ID: > The `# assert(false) failed: Bad graph detected in build_loop_late` failure was caused because a string concatenation optimization using [this method](https://github.com/openjdk/jdk/blob/819f3d6fc70ff6fe54ac5f9033c17c3dd4326aa5/src/hotspot/share/opto/graphKit.cpp#L4115) adds AddP and LoadN nodes to IR graph as NotNull _and_ because RAM was not "nullyfing" phis merging nullable pointers. I was only able to reproduce this problem using a classfile/jar compiled using an "old" version of JDK.. because newer version use InvokeDynamic to do string concatenation. > > Tested with JTREG tier1-4 on Linux x86_64 & ARM64. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Refactor split_castpp_load_through_phi ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19147/files - new: https://git.openjdk.org/jdk/pull/19147/files/26f0e4d5..94eb0e12 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19147&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19147&range=00-01 Stats: 38 lines in 1 file changed: 13 ins; 15 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/19147.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19147/head:pull/19147 PR: https://git.openjdk.org/jdk/pull/19147 From cslucas at openjdk.org Wed May 15 04:11:37 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 15 May 2024 04:11:37 GMT Subject: RFR: JDK-8330565 : C2: Multiple crashes with CTW after JDK-8316991 [v2] In-Reply-To: References: Message-ID: On Thu, 9 May 2024 01:26:00 GMT, Vladimir Kozlov wrote: >> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactor split_castpp_load_through_phi > > src/hotspot/share/opto/escape.cpp line 779: > >> 777: _igvn->set_type(data_phi, new_t); >> 778: data_phi->raise_bottom_type(new_t); >> 779: } > > Do you intentionally execute `_igvn->transform(` for `data_phi` before you set inputs and now type? > Usually we do transform after we fully construct node. I used to call `transform` right after the nodes were created. Thanks for clarifying on the right way to use it. I refactored this method to call `transform` only after the nodes have all inputs set - doing that alone doesn't fix the problem, though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19147#discussion_r1600914310 From chagedorn at openjdk.org Wed May 15 06:27:01 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 15 May 2024 06:27:01 GMT Subject: RFR: 8332245: C2: missing record_for_ign() call in GraphKit::must_be_not_null() [v2] In-Reply-To: References: Message-ID: On Tue, 14 May 2024 16:28:17 GMT, Roland Westrelin wrote: >> The `If` node that's created by `GraphKit::must_be_not_null()` is not >> enqueued for igvn when it's created. The test case shows it prevents 2 >> identical tests from commoning when the first igvn executes. This is a >> minor issue I noticed whule working on something else. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > fixed test @summary Looks good to me, too. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19233#pullrequestreview-2056992695 From pminborg at openjdk.org Wed May 15 06:49:07 2024 From: pminborg at openjdk.org (Per Minborg) Date: Wed, 15 May 2024 06:49:07 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v2] In-Reply-To: <1BnXIoSgu8PhvzHlCE5aaxAUtjBrayd935yQSnqZZbc=.18384687-9678-4b84-9cff-95c51f51d528@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> <1BnXIoSgu8PhvzHlCE5aaxAUtjBrayd935yQSnqZZbc=.18384687-9678-4b84-9cff-95c51f51d528@github.com> Message-ID: On Tue, 14 May 2024 16:02:41 GMT, Viktor Klang wrote: >> Per Minborg has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove text in public class that references an internal class > > src/java.base/share/classes/jdk/internal/lang/StableValue.java line 171: > >> 169: /** >> 170: * {@return a fresh stable value with an unset value where the returned stable's >> 171: * value is computed in a separate background thread (created via the provided > > Suggestion: > > * {@return a fresh stable value with an unset value where the returned stable > * value is computed in a separate background thread (created via the provided It is a bit confusing with "stable value" and the "stable value's value". Maybe something like: * {@return a fresh stable value with an unset value where its * value is computed in a separate background thread (created via the provided ... ? > src/java.base/share/classes/jdk/internal/lang/StableValue.java line 175: > >> 173: *

    >> 174: * If the supplier throws an (unchecked) exception, the exception is ignored, and no >> 175: * value is set. > > Is it likely that users will want to be made aware of failures? If so, perhaps it would make sense to make sure that the Exception hits the UncaughtExceptionHandler? ? Good suggestion. An alternative would be to provide an exception listener getting invoked upon hitting an exception. > src/java.base/share/classes/jdk/internal/lang/stable/StableArray3DImpl.java line 33: > >> 31: Objects.checkIndex(i1, dim1); >> 32: Objects.checkIndex(i2, dim2); >> 33: final int index = i0 * dim1 * dim2 + i1 * dim2 + i2; > > Might be worth doing some overflow checking here? At construction, we assert the invariant; the product of `dim0, dim1, and dim2` can be fit in an `int`: private StableArray3DImpl(int dim0, int dim1, int dim2) { this(dim0, dim1, dim2, Math.multiplyExact(Math.multiplyExact(dim0, dim1), dim2)); } The three `checkIndex(iN, dimN)` ensures each iN is greater than zero and less than dimN. This means, by induction, that the operation `final int index = i0 * dim1 * dim2 + i1 * dim2 + i2;` will never overflow. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1601035645 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1601028336 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1601025953 From roland at openjdk.org Wed May 15 06:53:30 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 15 May 2024 06:53:30 GMT Subject: RFR: 8332245: C2: missing record_for_ign() call in GraphKit::must_be_not_null() [v3] In-Reply-To: References: Message-ID: <5NTiCLF_-Vz5grh6xiQSpTqDWcMgJB3pHeNcgPzQcrw=.356ae51f-8469-4f1c-9a76-3ae7d28a1067@github.com> > The `If` node that's created by `GraphKit::must_be_not_null()` is not > enqueued for igvn when it's created. The test case shows it prevents 2 > identical tests from commoning when the first igvn executes. This is a > minor issue I noticed whule working on something else. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/graphKit.cpp Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19233/files - new: https://git.openjdk.org/jdk/pull/19233/files/48fb906c..afc87210 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19233&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19233&range=01-02 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19233.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19233/head:pull/19233 PR: https://git.openjdk.org/jdk/pull/19233 From roland at openjdk.org Wed May 15 07:19:03 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 15 May 2024 07:19:03 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v4] In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 13:11:38 GMT, Emanuel Peter wrote: >> @eme64 did you get a chance to look at the answers to your questions? > > @rwestrel It seems I only get notifications for new messages, not responses. Looking at the PR now... @eme64 do you need to review the updated change? Also can you answer the question you left unanswered about the test case: "can you please explain why a run without flags make sense?" ------------- PR Comment: https://git.openjdk.org/jdk/pull/18377#issuecomment-2111758487 From roland at openjdk.org Wed May 15 07:19:04 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 15 May 2024 07:19:04 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: <05mqp07IWOcReVgYabEiDRcLueAMXs8Y8sb1u5SqKiA=.ac76547b-7dcc-4c9b-8db6-d251e2cc9625@github.com> References: <05mqp07IWOcReVgYabEiDRcLueAMXs8Y8sb1u5SqKiA=.ac76547b-7dcc-4c9b-8db6-d251e2cc9625@github.com> Message-ID: On Tue, 14 May 2024 20:40:18 GMT, Tobias Hartmann wrote: >> I realized that I didn't understand your comment when I replied. >> What you're saying, I think, is that if we have, say, a `CastII` that's input to a `DivI` node, if the input to that cast is non zero, then we don't need to add the `CastII` control as dependency to the `DivI`. The problem, I think, is that the `CastII` could be input to say an `AddI` node which would then be input to the `DivI`. What we would then need to know is whether if we remove the `CastII`, the `AddI` is still non null or not. That doesn't seem straightforward because this is done once we have no igvn instance to propagate types anymore. So, while I agree this is conservative, it still seems like the most reasonable fix. > >> What you're saying, I think, is that if we have, say, a CastII that's input to a DivI node, if the input to that cast is non zero, then we don't need to add the CastII control as dependency to the DivI > > Yes, that was my point. > >> That doesn't seem straightforward because this is done once we have no igvn instance to propagate types anymore. So, while I agree this is conservative, it still seems like the most reasonable fix. > > Right, we can still go down that path if it ever becomes necessary. > >> That seems like a different problem that out of the scope of this particular issue. > > Could you please file a follow-up bug for that? I filed https://bugs.openjdk.org/browse/JDK-8332268 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18377#discussion_r1601082387 From epeter at openjdk.org Wed May 15 07:22:03 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 15 May 2024 07:22:03 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: Message-ID: <23IIbVkvsR_any6G0nw91mlQt0YrDsTTDJZ0DHowOnU=.d16ad6cf-b739-4e60-bafc-fa4684067574@github.com> On Tue, 16 Apr 2024 14:14:02 GMT, Roland Westrelin wrote: >> I think it would be great to have one run with absolutely no flags. > > @eme64 can you please explain why a run without flags make sense? @rwestrel we internally have lots of different runs with different flags. Sometimes bugs only show under certain flag combinations. If you always have the flags on in the test already, then some combinations may not be effective. But if you think that some flags MUST always be on for the test to make any sense, then keep them, I guess. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18377#discussion_r1601093427 From epeter at openjdk.org Wed May 15 07:28:04 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 15 May 2024 07:28:04 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v4] In-Reply-To: References: Message-ID: On Fri, 3 May 2024 10:11:25 GMT, Roland Westrelin wrote: >> Range check `CastII` nodes are removed once loop opts are over. The >> test case for this change includes 3 cases where elimination of a >> range check `CastII` causes a crash in compiled code because either a >> out of bounds array load or a division by zero happen. >> >> In `test1`: >> >> - the range checks for the `array[otherArray.length]` loads constant >> fold: `otherArray.length` is a `CastII` of i at the `otherArray` >> allocation. `i` is less than 9. The `CastII` at the allocation >> narrows the type down further to `[0-9]`. >> >> - the `array[otherArray.length]` loads are control dependent on the >> unrelated: >> >> >> if (flag == 0) { >> >> >> test. There's an identical dominating test which replaces that one. As >> a consequence, the `array[otherArray.length]` loads become control >> dependent on the dominating test. >> >> - The `CastII` nodes at the `otherArray` allocations are replaced by a >> dominating range check `CastII` nodes for: >> >> >> newArray[i] = 42; >> >> >> - After loop opts, the range check `CastII` nodes are removed and the >> 2 `array[otherArray.length]` loads common at the first: >> >> >> if (flag == 0) { >> >> >> test before the: >> >> >> float[] otherArray = new float[i]; >> >> >> and >> >> >> newArray[i] = 42; >> >> >> that guarantee `i` is positive. >> >> - `test1` is called with `i = -1`, the array load proceeds with an out >> of bounds index and the crash occurs. >> >> >> `test2` and `test3` are mostly identical except for the check that's >> eliminated (a null divisor check) and the instruction that causes a >> fault (an integer division). >> >> The fix I propose is to not eliminate range check `CastII` nodes after >> loop opts. When range check`CastII` nodes were introduced, performance >> was observed to regress. Removing them after loop opts was found to >> preserve both correctness and performance. Today, the performance >> regression still exists when `CastII` nodes are left in. So I propose >> we keep them until the end of optimizations (so the 2 array loads >> above don't lose a dependency and wrongly common) but remove them at >> the end of all optimizations. >> >> In the case of the array loads, they are dependent on a range check >> for another array through a range check `CastII` and we must not lose >> that dependency otherwise the array loads could float above the range >> check at gcm time. I propose we deal with that problem the way it's >> handled for `CastPP` nodes: add the dependency to the load (or >> division)nodes ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > test fix I suggest you refactor the DIV/MOD checks, but otherwise I'm ok with the updates. src/hotspot/share/opto/compile.cpp line 3920: > 3918: if (use->is_Mem() || use->Opcode() == Op_DivI || use->Opcode() == Op_DivL || > 3919: use->Opcode() == Op_ModI || use->Opcode() == Op_ModL || use->Opcode() == Op_UDivI || > 3920: use->Opcode() == Op_UDivL || use->Opcode() == Op_UModI || use->Opcode() == Op_UModL) { This kinda smells like it should be its own method. Seems we do not have a superclass for all Mod/Div nodes. Maybe we should have that? Or otherwise just a `Node::is_div_or_mod()` method? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18377#pullrequestreview-2057136936 PR Review Comment: https://git.openjdk.org/jdk/pull/18377#discussion_r1601099429 From pminborg at openjdk.org Wed May 15 07:39:25 2024 From: pminborg at openjdk.org (Per Minborg) Date: Wed, 15 May 2024 07:39:25 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v3] In-Reply-To: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: > # Stable Values & Collections (Internal) > > ## Summary > This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. > > ## Goals > * Provide an easy and intuitive API to describe value holders that can change at most once. > * Decouple declaration from initialization without significant footprint or performance penalties. > * Reduce the amount of static initializer and/or field initialization code. > * Uphold integrity and consistency, even in a multi-threaded environment. > > For more details, see the draft JEP: https://openjdk.org/jeps/8312611 > > ## Performance > Performance compared to instance variables using an `AtomicReference` and one protected by double-checked locking under concurrent access by 8 threads: > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.instanceAtomic avgt 10 1.576 ? 0.052 ns/op > StableBenchmark.instanceDCL avgt 10 1.608 ? 0.059 ns/op > StableBenchmark.instanceStable avgt 10 0.979 ? 0.023 ns/op <- StableValue (~40% faster than DCL) > > > Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (8 threads): > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.staticAtomic avgt 10 1.335 ? 0.056 ns/op > StableBenchmark.staticCHI avgt 10 0.623 ? 0.086 ns/op > StableBenchmark.staticDCL avgt 10 1.418 ? 0.171 ns/op > StableBenchmark.staticList avgt 10 0.617 ? 0.024 ns/op > StableBenchmark.staticStable avgt 10 0.604 ? 0.022 ns/op <- StableValue ( > 2x faster than `AtomicInteger` and DCL) > > > Performance for stable lists in both instance and static contexts whereby the sum of random contents is calculated for stable lists (which are thread-safe) compared to `ArrayList` instances (which are not thread-safe) (under single thread access): > > > Benchmark Mode Cnt Score Error Units > StableListSumBenchmark.instanceArrayList avgt 10 0.356 ? 0.005 ns/op > StableListSumBenchmark.instanceList avgt 10 0.373 ? 0.017 ns/op <- Stable list > StableListSumBenchmark.staticArrayList avgt 10 0.352 ? 0.002 ns/op > StableListSumBenchmark.staticList avgt 10 0.356 ? 0.00... Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Add delegation to the thread's exception handler ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18794/files - new: https://git.openjdk.org/jdk/pull/18794/files/d7c31585..7db1101c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=01-02 Stats: 70 lines in 3 files changed: 62 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/18794.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18794/head:pull/18794 PR: https://git.openjdk.org/jdk/pull/18794 From rrich at openjdk.org Wed May 15 07:48:12 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 15 May 2024 07:48:12 GMT Subject: RFR: 8331311: C2: Big Endian Port of 8318446: optimize stores into primitive arrays by combining values into larger store In-Reply-To: References: Message-ID: On Mon, 13 May 2024 15:53:52 GMT, Richard Reingruber wrote: > This pr adds a few tweaks to [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446) which allows enabling it also on big endian platforms (e.g. AIX, S390). JDK-8318446 introduced a C2 optimization to replace consecutive stores to a primitive array with just one store. > > By example (from `TestMergeStores.java`): > > > static Object[] test2a(byte[] a, int offset, long v) { > if (IS_BIG_ENDIAN) { > a[offset + 0] = (byte)(v >> 56); > a[offset + 1] = (byte)(v >> 48); > a[offset + 2] = (byte)(v >> 40); > a[offset + 3] = (byte)(v >> 32); > a[offset + 4] = (byte)(v >> 24); > a[offset + 5] = (byte)(v >> 16); > a[offset + 6] = (byte)(v >> 8); > a[offset + 7] = (byte)(v >> 0); > } else { > a[offset + 0] = (byte)(v >> 0); > a[offset + 1] = (byte)(v >> 8); > a[offset + 2] = (byte)(v >> 16); > a[offset + 3] = (byte)(v >> 24); > a[offset + 4] = (byte)(v >> 32); > a[offset + 5] = (byte)(v >> 40); > a[offset + 6] = (byte)(v >> 48); > a[offset + 7] = (byte)(v >> 56); > } > return new Object[]{ a }; > } > > > Depending on the endianess 8 bytes are stored into an array. The order of the stores is the same as the order of an 8-byte-store therefore 8 1-byte-stores can be replaced with just one 8-byte-store (if there aren't too many range checks). > > Additionally I've fixed a few comments and a test bug. > > The optimization seems to be a little bit more effective on big endian platforms. > > Again by example: > > > static Object[] test800a(byte[] a, int offset, long v) { > if (IS_BIG_ENDIAN) { > a[offset + 0] = (byte)(v >> 40); // Removed from candidate list > a[offset + 1] = (byte)(v >> 32); // Removed from candidate list > a[offset + 2] = (byte)(v >> 24); // Merged > a[offset + 3] = (byte)(v >> 16); // Merged > a[offset + 4] = (byte)(v >> 8); // Merged > a[offset + 5] = (byte)(v >> 0); // Merged > } else { > a[offset + 0] = (byte)(v >> 0); // Removed from candidate list > a[offset + 1] = (byte)(v >> 8); // Removed from candidate list > a[offset + 2] = (byte)(v >> 16); // Not merged > a[offset + 3] = (byte)(v >> 24); // Not merged > a[offset + 4] = (byte)(v >> 32); // Not merged > a[offset + 5] = (byte)(v >> 40); // Not merged > } > return new Object[]{ a };... > It's not obvious to me why something like > > ```c++ > a[offset + 2] = (byte)(v >> 16); // Not merged > a[offset + 3] = (byte)(v >> 24); // Not merged > a[offset + 4] = (byte)(v >> 32); // Not merged > a[offset + 5] = (byte)(v >> 40); // Not merged > ``` > > can't be merged. The stores could be merged to the following pseudo code: ```c++ *(int*)&a[offset + 2] = (int)(v >> 16); // Merged The current logic doesn't accept the right shift [here](https://github.com/openjdk/jdk/blob/c642f44bbe1e4cdbc23496a34ddaae30990ce7c0/src/hotspot/share/opto/memnode.cpp#L3302). I think at that location we can always accept `merged_input_value` asserting that it is a right shift of `base_last` since the `is_adjacent_input_pair` checks succeeded before. I haven't tried it though. I'll clarify the synopsis of this pr and the comment in `TestMergeStores.java`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19218#issuecomment-2111808731 From pminborg at openjdk.org Wed May 15 07:48:42 2024 From: pminborg at openjdk.org (Per Minborg) Date: Wed, 15 May 2024 07:48:42 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v4] In-Reply-To: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: > # Stable Values & Collections (Internal) > > ## Summary > This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. > > ## Goals > * Provide an easy and intuitive API to describe value holders that can change at most once. > * Decouple declaration from initialization without significant footprint or performance penalties. > * Reduce the amount of static initializer and/or field initialization code. > * Uphold integrity and consistency, even in a multi-threaded environment. > > For more details, see the draft JEP: https://openjdk.org/jeps/8312611 > > ## Performance > Performance compared to instance variables using an `AtomicReference` and one protected by double-checked locking under concurrent access by 8 threads: > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.instanceAtomic avgt 10 1.576 ? 0.052 ns/op > StableBenchmark.instanceDCL avgt 10 1.608 ? 0.059 ns/op > StableBenchmark.instanceStable avgt 10 0.979 ? 0.023 ns/op <- StableValue (~40% faster than DCL) > > > Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (8 threads): > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.staticAtomic avgt 10 1.335 ? 0.056 ns/op > StableBenchmark.staticCHI avgt 10 0.623 ? 0.086 ns/op > StableBenchmark.staticDCL avgt 10 1.418 ? 0.171 ns/op > StableBenchmark.staticList avgt 10 0.617 ? 0.024 ns/op > StableBenchmark.staticStable avgt 10 0.604 ? 0.022 ns/op <- StableValue ( > 2x faster than `AtomicInteger` and DCL) > > > Performance for stable lists in both instance and static contexts whereby the sum of random contents is calculated for stable lists (which are thread-safe) compared to `ArrayList` instances (which are not thread-safe) (under single thread access): > > > Benchmark Mode Cnt Score Error Units > StableListSumBenchmark.instanceArrayList avgt 10 0.356 ? 0.005 ns/op > StableListSumBenchmark.instanceList avgt 10 0.373 ? 0.017 ns/op <- Stable list > StableListSumBenchmark.staticArrayList avgt 10 0.352 ? 0.002 ns/op > StableListSumBenchmark.staticList avgt 10 0.356 ? 0.00... Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Revise docs for ofBackground() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18794/files - new: https://git.openjdk.org/jdk/pull/18794/files/7db1101c..c92b16c4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=02-03 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18794.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18794/head:pull/18794 PR: https://git.openjdk.org/jdk/pull/18794 From galder at openjdk.org Wed May 15 07:51:23 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 15 May 2024 07:51:23 GMT Subject: Integrated: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays In-Reply-To: References: Message-ID: On Thu, 1 Feb 2024 05:53:23 GMT, Galder Zamarre?o wrote: > Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. > > The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: > > > $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op > ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op > ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op > ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op > ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op > ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op > ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op > ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op > ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op > ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op > ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op > ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op > ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op > ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op > ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op > > > It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. > > I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. > > > $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > ... > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 > > > One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? > > Thanks @rwestrel for his help shaping this up :) This pull request has now been integrated. Changeset: 2f10a316 Author: Galder Zamarre?o Committer: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/2f10a316ff0c5a4c124b94f6fabb38fb119d2c82 Stats: 292 lines in 16 files changed: 258 ins; 4 del; 30 mod 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays Reviewed-by: dlong, roland ------------- PR: https://git.openjdk.org/jdk/pull/17667 From rrich at openjdk.org Wed May 15 07:53:33 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 15 May 2024 07:53:33 GMT Subject: RFR: 8331311: C2: Big Endian Port of 8318446: optimize stores into primitive arrays by combining values into larger store [v2] In-Reply-To: References: Message-ID: > This pr adds a few tweaks to [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446) which allows enabling it also on big endian platforms (e.g. AIX, S390). JDK-8318446 introduced a C2 optimization to replace consecutive stores to a primitive array with just one store. > > By example (from `TestMergeStores.java`): > > > static Object[] test2a(byte[] a, int offset, long v) { > if (IS_BIG_ENDIAN) { > a[offset + 0] = (byte)(v >> 56); > a[offset + 1] = (byte)(v >> 48); > a[offset + 2] = (byte)(v >> 40); > a[offset + 3] = (byte)(v >> 32); > a[offset + 4] = (byte)(v >> 24); > a[offset + 5] = (byte)(v >> 16); > a[offset + 6] = (byte)(v >> 8); > a[offset + 7] = (byte)(v >> 0); > } else { > a[offset + 0] = (byte)(v >> 0); > a[offset + 1] = (byte)(v >> 8); > a[offset + 2] = (byte)(v >> 16); > a[offset + 3] = (byte)(v >> 24); > a[offset + 4] = (byte)(v >> 32); > a[offset + 5] = (byte)(v >> 40); > a[offset + 6] = (byte)(v >> 48); > a[offset + 7] = (byte)(v >> 56); > } > return new Object[]{ a }; > } > > > Depending on the endianess 8 bytes are stored into an array. The order of the stores is the same as the order of an 8-byte-store therefore 8 1-byte-stores can be replaced with just one 8-byte-store (if there aren't too many range checks). > > Additionally I've fixed a few comments and a test bug. > > The optimization seems to be a little bit more effective on big endian platforms. > > Again by example: > > > static Object[] test800a(byte[] a, int offset, long v) { > if (IS_BIG_ENDIAN) { > a[offset + 0] = (byte)(v >> 40); // Removed from candidate list > a[offset + 1] = (byte)(v >> 32); // Removed from candidate list > a[offset + 2] = (byte)(v >> 24); // Merged > a[offset + 3] = (byte)(v >> 16); // Merged > a[offset + 4] = (byte)(v >> 8); // Merged > a[offset + 5] = (byte)(v >> 0); // Merged > } else { > a[offset + 0] = (byte)(v >> 0); // Removed from candidate list > a[offset + 1] = (byte)(v >> 8); // Removed from candidate list > a[offset + 2] = (byte)(v >> 16); // Not merged > a[offset + 3] = (byte)(v >> 24); // Not merged > a[offset + 4] = (byte)(v >> 32); // Not merged > a[offset + 5] = (byte)(v >> 40); // Not merged > } > return new Object[]{ a };... Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Improve comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19218/files - new: https://git.openjdk.org/jdk/pull/19218/files/9cbe9642..dc05bb0b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19218&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19218&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19218.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19218/head:pull/19218 PR: https://git.openjdk.org/jdk/pull/19218 From epeter at openjdk.org Wed May 15 07:53:33 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 15 May 2024 07:53:33 GMT Subject: RFR: 8331311: C2: Big Endian Port of 8318446: optimize stores into primitive arrays by combining values into larger store In-Reply-To: References: Message-ID: On Tue, 14 May 2024 21:09:35 GMT, Dean Long wrote: >> This pr adds a few tweaks to [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446) which allows enabling it also on big endian platforms (e.g. AIX, S390). JDK-8318446 introduced a C2 optimization to replace consecutive stores to a primitive array with just one store. >> >> By example (from `TestMergeStores.java`): >> >> >> static Object[] test2a(byte[] a, int offset, long v) { >> if (IS_BIG_ENDIAN) { >> a[offset + 0] = (byte)(v >> 56); >> a[offset + 1] = (byte)(v >> 48); >> a[offset + 2] = (byte)(v >> 40); >> a[offset + 3] = (byte)(v >> 32); >> a[offset + 4] = (byte)(v >> 24); >> a[offset + 5] = (byte)(v >> 16); >> a[offset + 6] = (byte)(v >> 8); >> a[offset + 7] = (byte)(v >> 0); >> } else { >> a[offset + 0] = (byte)(v >> 0); >> a[offset + 1] = (byte)(v >> 8); >> a[offset + 2] = (byte)(v >> 16); >> a[offset + 3] = (byte)(v >> 24); >> a[offset + 4] = (byte)(v >> 32); >> a[offset + 5] = (byte)(v >> 40); >> a[offset + 6] = (byte)(v >> 48); >> a[offset + 7] = (byte)(v >> 56); >> } >> return new Object[]{ a }; >> } >> >> >> Depending on the endianess 8 bytes are stored into an array. The order of the stores is the same as the order of an 8-byte-store therefore 8 1-byte-stores can be replaced with just one 8-byte-store (if there aren't too many range checks). >> >> Additionally I've fixed a few comments and a test bug. >> >> The optimization seems to be a little bit more effective on big endian platforms. >> >> Again by example: >> >> >> static Object[] test800a(byte[] a, int offset, long v) { >> if (IS_BIG_ENDIAN) { >> a[offset + 0] = (byte)(v >> 40); // Removed from candidate list >> a[offset + 1] = (byte)(v >> 32); // Removed from candidate list >> a[offset + 2] = (byte)(v >> 24); // Merged >> a[offset + 3] = (byte)(v >> 16); // Merged >> a[offset + 4] = (byte)(v >> 8); // Merged >> a[offset + 5] = (byte)(v >> 0); // Merged >> } else { >> a[offset + 0] = (byte)(v >> 0); // Removed from candidate list >> a[offset + 1] = (byte)(v >> 8); // Removed from candidate list >> a[offset + 2] = (byte)(v >> 16); // Not merged >> a[offset + 3] = (byte)(v >> 24); // Not merged >> a[offset + 4] = (byte)(v >> 32); // Not merge... > > In other words, it seems like it could work for arbitrary byte values if the merged value was computed from those individual values. They wouldn't need to be shifted values. > > a[offset + 0] = (byte)0x1; > a[offset + 1] = (byte)(0x2; > a[offset + 2] = (byte)0x3; > a[offset + 3] = (byte)(0x4; > > The example above would either write 0x01020304 or 0x04030201 depending on the endianness. @dean-long @reinrich Yes, I guess that is a generalization that could be made. It would require a lot more tests to make sure all combinations are checked. I would suggest doing that in a separate RFE to keep things simple and reviewable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19218#issuecomment-2111816593 From amitkumar at openjdk.org Wed May 15 07:55:15 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 15 May 2024 07:55:15 GMT Subject: RFR: 8331934: [s390x] Add support for primitive array C1 clone intrinsic [v2] In-Reply-To: References: Message-ID: > Adds JDK-8302850 Port for s390x. > > Testing: > > make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:hotspot_compiler 1166 1166 0 0 > ============================== > TEST SUCCESS > > * Tier1 Test with Fast debug build. > > BenchMarking: > > > Without Patch: > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 10.838 ? 0.461 ns/op > ArrayClone.byteArraycopy 10 avgt 15 28.919 ? 1.695 ns/op > ArrayClone.byteArraycopy 100 avgt 15 48.815 ? 0.901 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 256.357 ? 7.901 ns/op > ArrayClone.byteClone 0 avgt 15 90.398 ? 3.119 ns/op > ArrayClone.byteClone 10 avgt 15 103.774 ? 4.468 ns/op > ArrayClone.byteClone 100 avgt 15 126.628 ? 6.952 ns/op > ArrayClone.byteClone 1000 avgt 15 326.409 ? 31.635 ns/op > ArrayClone.intArraycopy 0 avgt 15 10.450 ? 0.509 ns/op > ArrayClone.intArraycopy 10 avgt 15 36.903 ? 0.753 ns/op > ArrayClone.intArraycopy 100 avgt 15 85.964 ? 1.806 ns/op > ArrayClone.intArraycopy 1000 avgt 15 841.512 ? 40.335 ns/op > ArrayClone.intClone 0 avgt 15 89.332 ? 3.695 ns/op > ArrayClone.intClone 10 avgt 15 110.639 ? 2.476 ns/op > ArrayClone.intClone 100 avgt 15 195.781 ? 8.622 ns/op > ArrayClone.intClone 1000 avgt 15 1058.479 ? 92.468 ns/op > Finished running test 'micro:java.lang.ArrayClone' > > > with patch: > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 10.526 ? 0.289 ns/op > ArrayClone.byteArraycopy 10 avgt 15 27.110 ? 0.656 ns/op > Arra... Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: - s390x Port - Update src/hotspot/share/c1/c1_GraphBuilder.cpp Co-authored-by: Dean Long <17332032+dean-long at users.noreply.github.com> - Fix assert to only have a single ! - Assert type is not interface - Remove whitespace - Expanded testing in TestNullArrayClone * Added byte[] and long[] tests. * Verified that the cloned array has the same contents. * Increase number of iterations reach tier 3 threshold. - Update src/hotspot/share/c1/c1_GraphBuilder.cpp Co-authored-by: Boris <42576543+bulasevich at users.noreply.github.com> - Added test summary - Use vmIntrinsics instead of vmIntrinsicID - Fix formatting - ... and 15 more: https://git.openjdk.org/jdk/compare/eebcc218...d462e56b ------------- Changes: https://git.openjdk.org/jdk/pull/19220/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19220&range=01 Stats: 337 lines in 20 files changed: 281 ins; 6 del; 50 mod Patch: https://git.openjdk.org/jdk/pull/19220.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19220/head:pull/19220 PR: https://git.openjdk.org/jdk/pull/19220 From galder at openjdk.org Wed May 15 08:13:15 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 15 May 2024 08:13:15 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v17] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 17:40:54 GMT, Galder Zamarre?o wrote: >> Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. >> >> The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: >> >> >> $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op >> ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op >> ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op >> ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op >> ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op >> ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op >> ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op >> ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op >> ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op >> ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op >> ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op >> ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op >> >> >> It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. >> >> I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. >> >> >> $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> ... >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 >> >> >> One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? >> >>... > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Add assert message Thanks all for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17667#issuecomment-2111852622 From gcao at openjdk.org Wed May 15 08:14:31 2024 From: gcao at openjdk.org (Gui Cao) Date: Wed, 15 May 2024 08:14:31 GMT Subject: RFR: 8331281: RISC-V: C2: Support vector-scalar and vector-immediate bitwise logic instructions [v2] In-Reply-To: References: Message-ID: > Hi, We want to support vector-scalar and vector-immediate bitwise logic instructions, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot. > We can use the Int256VectorTests.java[2] to print the compilation log, verify and observe the generation of nodes. > > For example, we can use the following command to print the compilation log of a jtreg test case: > > > /home/zifeihan/jdk-tools/jtreg/bin/jtreg \ > -v:default \ > -concurrency:16 -timeout:50 \ > -javaoption:-XX:+UnlockExperimentalVMOptions \ > -javaoption:-XX:+UseRVV \ > -javaoption:-XX:+PrintOptoAssembly \ > -javaoption:-XX:LogFile=/home/zifeihan/jdk/Int256VectorTests_PrintOptoAssembly.log \ > -jdk:/home/zifeihan/jdk/build/linux-riscv64-server-fastdebug/jdk \ > /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/Int256VectorTests.java > > > > we can observe the specified compilation log `Int256VectorTests_PrintOptoAssembly.log`, which contains the vector-scalar and vector-immediate bitwise logic node for the PR implementation. > > vand_immI Node > > > 0b4 vloadcon V3 # generate iota indices > 0bc vmla V2, V2, V3, V1 > 0c4 vand_immI V2, V2, #7 > 0cc addi R7, R30, #16 # ptr, #@addP_reg_imm > 0d0 storeV [R7], V2 # vector (rvv) > > > vor_regI Node > > > 180 vor_regI V1, V1, R30 > 188 add R31, R14, R31 # ptr, #@addP_reg_reg > 18a addi R31, R31, #16 # ptr, #@addP_reg_imm > 18c storeV [R31], V1 # vector (rvv) > 194 addiw R11, R11, #8 #@addI_reg_imm > 196 blt R11, R13, B17 #@cmpI_loop P=0.500000 C=30564.000000 > > > vxor_regI Node > > 198 vxor_regI V1, V1, R30 > 1a0 add R14, R16, R14 # ptr, #@addP_reg_reg > 1a2 addi R14, R14, #16 # ptr, #@addP_reg_imm > 1a4 storeV [R14], V1 # vector (rvv) > 1ac addiw R11, R11, #8 #@addI_reg_imm > 1ae blt R11, R13, B21 #@cmpI_loop P=0.500000 C=30564.000000 > > > vand_regI_masked Node > > 234 B31: # out( B40 B32 ) <- in( B30 ) Freq: 78.5481 > 234 loadV V2, [R15] # vector (rvv) > 23c vand_regI_masked V2, V2, R11 > 244 storeV [R9], V2 # vector (rvv) > 24c mv R10, #8 # int, #@loadConI > 24e ble R7, R10, B40 #@cmpI_branch P=0.000001 C=-1.000000 > > > vor_regI_masked Node > > 1ee B32: # out( B38 B33 ) <- in( B31 ) Freq: 75.8475 > 1ee loadV V1, [R11] # vector (rvv) > 1f6 vor_regI_masked V1, V1, R31 > 1fe addi R11, R13, #32 # ptr, #@addP_reg_imm > 202 bgeu R29, R10, B38 #@cmpU_branch P=0.000001 C=-1.000000 > > vxor_regI_masked Node > > 1ee B32: # out( B38 B33 ) <- in( B31 ) Freq: 75.8475 > 1ee loadV V1, [R11]... Gui Cao has updated the pull request incrementally with one additional commit since the last revision: Use iRegIorL2I to replace iRegI in AndV/OrVXorV instruct ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18999/files - new: https://git.openjdk.org/jdk/pull/18999/files/a05b204e..69c196e7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18999&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18999&range=00-01 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/18999.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18999/head:pull/18999 PR: https://git.openjdk.org/jdk/pull/18999 From gcao at openjdk.org Wed May 15 08:17:04 2024 From: gcao at openjdk.org (Gui Cao) Date: Wed, 15 May 2024 08:17:04 GMT Subject: RFR: 8331281: RISC-V: C2: Support vector-scalar and vector-immediate bitwise logic instructions [v2] In-Reply-To: References: Message-ID: On Sat, 11 May 2024 07:29:46 GMT, Feilong Jiang wrote: >> Gui Cao has updated the pull request incrementally with one additional commit since the last revision: >> >> Use iRegIorL2I to replace iRegI in AndV/OrVXorV instruct > > src/hotspot/cpu/riscv/riscv_v.ad line 513: > >> 511: // vector-scalar and (unpredicated) >> 512: >> 513: instruct vand_regI(vReg dst_src, iRegI src) %{ > > Do we need `iRegIorL2I` for `RegI` related instructions? Thanks for your review. fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18999#discussion_r1601166494 From chagedorn at openjdk.org Wed May 15 08:47:19 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 15 May 2024 08:47:19 GMT Subject: RFR: 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode [v3] In-Reply-To: References: Message-ID: > This patch replaces the `Opaque4Node` of the `If` for Initialized Assertion Predicates with a new `OpaqueInitializedAsseritonPredicateNode`. This helps to simplify pattern matching for predicate code and to distinguish from the two other uses of `Opaque4` nodes: > 1. Template Assertion Predicate: The goal is to get rid of its `Opaque4Node` as well by using a dedicated `TemplateAssertionPredicateNode` for the `IfNode`. > 2. Non-null-checks with instrinsics and unsafe accesses: This will eventually be the only use left. Once we get there, we should rename the node accordingly to `OpaqueNonNullCheck` or something like that. > > I went through all the uses of `Opaque4` nodes and did the following: > - Could the `Opaque4` node be part of an Initialized Assertion Predicate? > - No: Added an assert that we are not dealing with an Initialized Assertion Predicate. > - Yes: > - Yes **and only** for Initialized Assertion Predicates? Added an assert that we are only expecting an `OpaqueInitializedAsseritonPredicateNode` if appropriate. > - Yes but could also be something else: Added case for `OpaqueInitializedAsseritonPredicateNode` next to the `Opaque4` case. > - Is this `Opaque4` node only used for Template Assertion Predicates? > - Yes: Added assert with call to `assertion_predicate_has_loop_opaque_node()` to check that we find its `OpaqueLoop*Nodes`. > - I've added test cases where I was not sure about whether an `Opaque4` node could be part of a Template, an Initialized Assertion Predicate or a non-null-check. This was a little tricky but I think it was still worth to prevent future bugs (even though most of these special cases are quite rare). > > This is another patch split off from the full fix for Assertion Predicates. > > Thanks, > Christian Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Make OpaqueInitializedAssertionPredicateNode a macro node again - asdf - Merge branch 'master' into JDK-8330386 - Merge branch 'master' into JDK-8330386 - Add more comments and asserts - Add more tests - 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18951/files - new: https://git.openjdk.org/jdk/pull/18951/files/fe3feb8b..5b9ec6ef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18951&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18951&range=01-02 Stats: 21353 lines in 464 files changed: 11329 ins; 6791 del; 3233 mod Patch: https://git.openjdk.org/jdk/pull/18951.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18951/head:pull/18951 PR: https://git.openjdk.org/jdk/pull/18951 From chagedorn at openjdk.org Wed May 15 08:47:19 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 15 May 2024 08:47:19 GMT Subject: RFR: 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode [v2] In-Reply-To: References: <_8csQpQVHlNpwenIT4H7OFkMSOaU6Fz-ZmJ0Yi6ArLU=.0b84b78d-4637-49ab-b43f-4c457498b0ce@github.com> <7b3qt72dd5rV6nirPQILkqTMleDRMRYuXlKpqVVVpyo=.c2ed3889-cb43-4576-9d63-de133152b7fb@github.com> Message-ID: On Mon, 13 May 2024 13:37:24 GMT, Roland Westrelin wrote: >> That's correct. I've originally had these nodes as macro nodes as well. But concepttionally, we want to get these nodes to be removed and the Initialized Assertion Predicates folded once we know that we no longer split loops (i.e. in post loop IGVN). I think it's easier to register them for this post loop IGVN run since we don't really expand the nodes to anything - they are just removed during expansion. >> >> I'm not entirely sure though what the original reason was to go with a macro expansion removal instead of a post loop IGVN removal for `Opaque4` nodes. Do you remember? > >> But concepttionally, we want to get these nodes to be removed and the Initialized Assertion Predicates folded once we know that we no longer split loops (i.e. in post loop IGVN). > > I don't think that's quite correct. Any round of igvn could cause the bounds of a counted loop to change in a way that conflicts with the types captured in the `CastII`/`ConvI2L` nodes. I think that's true even after loop optimizations are over. As a consequence, we want the Assertion Predicates to fold as late as possible. > > That's poorly tested currently because we emit the predicates in compiled code for debug builds so, in practice, we never really remove them. > > As part of this change, I wouldn't change that behavior. That seems risky. I see your point and agree with it. While Template Assertion Predicates should be removed after loop opts are over (no more splitting possible and thus no creation of new Initialized Assertion Predicates) we should indeed delay the removal of the Initialized Assertion Predicates to keep the graph sane. Ideally, they should be removed in the very last IGVN round. But there is currently no such dedicated "last IGVN round" which we could register a node for. Doing the removal in macro expansion is probably fine and as you've stated, it does not change the current behavior. I've pushed an update doing the change back to macro node. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18951#discussion_r1601214445 From duke at openjdk.org Wed May 15 08:59:14 2024 From: duke at openjdk.org (ExE Boss) Date: Wed, 15 May 2024 08:59:14 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v4] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Wed, 15 May 2024 07:48:42 GMT, Per Minborg wrote: >> # Stable Values & Collections (Internal) >> >> ## Summary >> This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. >> >> ## Goals >> * Provide an easy and intuitive API to describe value holders that can change at most once. >> * Decouple declaration from initialization without significant footprint or performance penalties. >> * Reduce the amount of static initializer and/or field initialization code. >> * Uphold integrity and consistency, even in a multi-threaded environment. >> >> For more details, see the draft JEP: https://openjdk.org/jeps/8312611 >> >> ## Performance >> Performance compared to instance variables using an `AtomicReference` and one protected by double-checked locking under concurrent access by 8 threads: >> >> >> Benchmark Mode Cnt Score Error Units >> StableBenchmark.instanceAtomic avgt 10 1.576 ? 0.052 ns/op >> StableBenchmark.instanceDCL avgt 10 1.608 ? 0.059 ns/op >> StableBenchmark.instanceStable avgt 10 0.979 ? 0.023 ns/op <- StableValue (~40% faster than DCL) >> >> >> Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (8 threads): >> >> >> Benchmark Mode Cnt Score Error Units >> StableBenchmark.staticAtomic avgt 10 1.335 ? 0.056 ns/op >> StableBenchmark.staticCHI avgt 10 0.623 ? 0.086 ns/op >> StableBenchmark.staticDCL avgt 10 1.418 ? 0.171 ns/op >> StableBenchmark.staticList avgt 10 0.617 ? 0.024 ns/op >> StableBenchmark.staticStable avgt 10 0.604 ? 0.022 ns/op <- StableValue ( > 2x faster than `AtomicInteger` and DCL) >> >> >> Performance for stable lists in both instance and static contexts whereby the sum of random contents is calculated for stable lists (which are thread-safe) compared to `ArrayList` instances (which are not thread-safe) (under single thread access): >> >> >> Benchmark Mode Cnt Score Error Units >> StableListSumBenchmark.instanceArrayList avgt 10 0.356 ? 0.005 ns/op >> StableListSumBenchmark.instanceList avgt 10 0.373 ? 0.017 ns/op <- Stable list >> StableListSumBenchmark.staticArrayList avgt 10 0.352 ? ... > > Per Minborg has updated the pull request incrementally with one additional commit since the last revision: > > Revise docs for ofBackground() src/java.base/share/classes/jdk/internal/lang/StableValue.java line 1: > 1: /* Maybe also add `StableValue?::?ofLazy?(Supplier)` which?behaves more?like the?original **Computed?Constants** JEP?draft? src/java.base/share/classes/jdk/internal/lang/stable/TrustedFieldType.java line 14: > 12: * operations. > 13: */ > 14: public sealed interface TrustedFieldType Maybe?export this?interface to?`jdk.unsupported`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1601229245 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1601223866 From imyers at openjdk.org Wed May 15 09:19:30 2024 From: imyers at openjdk.org (Ian Myers) Date: Wed, 15 May 2024 09:19:30 GMT Subject: RFR: 8324756: Test vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize is too slow due to dependency verification [v3] In-Reply-To: References: Message-ID: > This change removes dependency verification by passing -XX:-VerifyDependencies in the test. > > `vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java` takes 20min to run on linux-x86_64-server-fastdebug: > > time CONF=linux-x86_64-server-fastdebug make test TEST=vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java > CONF=linux-x86_64-server-fastdebug make test **1412.82s user 15.27s system 115% cpu 20:41.19 total** > > > Passing -XX:-VerifyDependencies flag speeds up the run time to 1min: > > time CONF=linux-x86_64-server-fastdebug make test TEST=vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java TEST_VM_OPTS="-XX:-VerifyDependencies" > CONF=linux-x86_64-server-fastdebug make test **287.27s user 16.19s system 496% cpu 1:01.10 total** > > > Adding -XX:-VerifyDependencies to the test file accomplishes the same run time of 1min: > > time CONF=linux-x86_64-server-fastdebug make test TEST=vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java > CONF=linux-x86_64-server-fastdebug make test **272.33s user 14.56s system 464% cpu 1:01.75 total** Ian Myers has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'openjdk:master' into fix-8324756 - [8324756] Remove dependency verification from vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19040/files - new: https://git.openjdk.org/jdk/pull/19040/files/99314e02..c2a6a66d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19040&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19040&range=01-02 Stats: 29262 lines in 763 files changed: 15869 ins; 8113 del; 5280 mod Patch: https://git.openjdk.org/jdk/pull/19040.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19040/head:pull/19040 PR: https://git.openjdk.org/jdk/pull/19040 From imyers at openjdk.org Wed May 15 09:22:17 2024 From: imyers at openjdk.org (Ian Myers) Date: Wed, 15 May 2024 09:22:17 GMT Subject: RFR: 8324756: Test vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize is too slow due to dependency verification [v4] In-Reply-To: References: Message-ID: > This change removes dependency verification by passing -XX:-VerifyDependencies in the test. > > `vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java` takes 20min to run on linux-x86_64-server-fastdebug: > > time CONF=linux-x86_64-server-fastdebug make test TEST=vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java > CONF=linux-x86_64-server-fastdebug make test **1412.82s user 15.27s system 115% cpu 20:41.19 total** > > > Passing -XX:-VerifyDependencies flag speeds up the run time to 1min: > > time CONF=linux-x86_64-server-fastdebug make test TEST=vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java TEST_VM_OPTS="-XX:-VerifyDependencies" > CONF=linux-x86_64-server-fastdebug make test **287.27s user 16.19s system 496% cpu 1:01.10 total** > > > Adding -XX:-VerifyDependencies to the test file accomplishes the same run time of 1min: > > time CONF=linux-x86_64-server-fastdebug make test TEST=vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java > CONF=linux-x86_64-server-fastdebug make test **272.33s user 14.56s system 464% cpu 1:01.75 total** Ian Myers has updated the pull request incrementally with one additional commit since the last revision: [8324756] Remove dependency verification from vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19040/files - new: https://git.openjdk.org/jdk/pull/19040/files/c2a6a66d..4a338e7f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19040&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19040&range=02-03 Stats: 15 lines in 1 file changed: 14 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19040.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19040/head:pull/19040 PR: https://git.openjdk.org/jdk/pull/19040 From amitkumar at openjdk.org Wed May 15 09:25:32 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 15 May 2024 09:25:32 GMT Subject: RFR: 8331934: [s390x] Add support for primitive array C1 clone intrinsic [v3] In-Reply-To: References: Message-ID: > Adds JDK-8302850 Port for s390x. > > Testing: > > make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:hotspot_compiler 1166 1166 0 0 > ============================== > TEST SUCCESS > > * Tier1 Test with Fast debug build. > > BenchMarking: > > > Without Patch: > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 10.838 ? 0.461 ns/op > ArrayClone.byteArraycopy 10 avgt 15 28.919 ? 1.695 ns/op > ArrayClone.byteArraycopy 100 avgt 15 48.815 ? 0.901 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 256.357 ? 7.901 ns/op > ArrayClone.byteClone 0 avgt 15 90.398 ? 3.119 ns/op > ArrayClone.byteClone 10 avgt 15 103.774 ? 4.468 ns/op > ArrayClone.byteClone 100 avgt 15 126.628 ? 6.952 ns/op > ArrayClone.byteClone 1000 avgt 15 326.409 ? 31.635 ns/op > ArrayClone.intArraycopy 0 avgt 15 10.450 ? 0.509 ns/op > ArrayClone.intArraycopy 10 avgt 15 36.903 ? 0.753 ns/op > ArrayClone.intArraycopy 100 avgt 15 85.964 ? 1.806 ns/op > ArrayClone.intArraycopy 1000 avgt 15 841.512 ? 40.335 ns/op > ArrayClone.intClone 0 avgt 15 89.332 ? 3.695 ns/op > ArrayClone.intClone 10 avgt 15 110.639 ? 2.476 ns/op > ArrayClone.intClone 100 avgt 15 195.781 ? 8.622 ns/op > ArrayClone.intClone 1000 avgt 15 1058.479 ? 92.468 ns/op > Finished running test 'micro:java.lang.ArrayClone' > > > with patch: > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 10.526 ? 0.289 ns/op > ArrayClone.byteArraycopy 10 avgt 15 27.110 ? 0.656 ns/op > Arra... Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: - Merge master - s390x Port - Update src/hotspot/share/c1/c1_GraphBuilder.cpp Co-authored-by: Dean Long <17332032+dean-long at users.noreply.github.com> - Fix assert to only have a single ! - Assert type is not interface - Remove whitespace - Expanded testing in TestNullArrayClone * Added byte[] and long[] tests. * Verified that the cloned array has the same contents. * Increase number of iterations reach tier 3 threshold. - Update src/hotspot/share/c1/c1_GraphBuilder.cpp Co-authored-by: Boris <42576543+bulasevich at users.noreply.github.com> - Added test summary - Use vmIntrinsics instead of vmIntrinsicID - ... and 16 more: https://git.openjdk.org/jdk/compare/2f10a316...865de5ba ------------- Changes: https://git.openjdk.org/jdk/pull/19220/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19220&range=02 Stats: 46 lines in 6 files changed: 23 ins; 2 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/19220.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19220/head:pull/19220 PR: https://git.openjdk.org/jdk/pull/19220 From imyers at openjdk.org Wed May 15 09:26:27 2024 From: imyers at openjdk.org (Ian Myers) Date: Wed, 15 May 2024 09:26:27 GMT Subject: RFR: 8324756: Test vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize is too slow due to dependency verification [v5] In-Reply-To: References: Message-ID: <2ygMhqSjsiuHeguO3lMC4FOUCVI28tci2j0-8j3k7F4=.bd620849-4738-491b-853f-a9d2bdbe2067@github.com> > This change removes dependency verification by passing -XX:-VerifyDependencies in the test. > > `vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java` takes 20min to run on linux-x86_64-server-fastdebug: > > time CONF=linux-x86_64-server-fastdebug make test TEST=vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java > CONF=linux-x86_64-server-fastdebug make test **1412.82s user 15.27s system 115% cpu 20:41.19 total** > > > Passing -XX:-VerifyDependencies flag speeds up the run time to 1min: > > time CONF=linux-x86_64-server-fastdebug make test TEST=vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java TEST_VM_OPTS="-XX:-VerifyDependencies" > CONF=linux-x86_64-server-fastdebug make test **287.27s user 16.19s system 496% cpu 1:01.10 total** > > > Adding -XX:-VerifyDependencies to the test file accomplishes the same run time of 1min: > > time CONF=linux-x86_64-server-fastdebug make test TEST=vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java > CONF=linux-x86_64-server-fastdebug make test **272.33s user 14.56s system 464% cpu 1:01.75 total** Ian Myers has updated the pull request incrementally with one additional commit since the last revision: [8324756] Remove dependency verification from vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19040/files - new: https://git.openjdk.org/jdk/pull/19040/files/4a338e7f..aa3bb0b6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19040&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19040&range=03-04 Stats: 16 lines in 2 files changed: 1 ins; 15 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19040.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19040/head:pull/19040 PR: https://git.openjdk.org/jdk/pull/19040 From yzheng at openjdk.org Wed May 15 09:35:02 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 15 May 2024 09:35:02 GMT Subject: RFR: 8331429: [JVMCI] Cleanup JVMCIRuntime allocation routines [v3] In-Reply-To: References: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> Message-ID: <2YCRyx088igNF1RBBlwKJL004Sd9D2QkqXdS8tXbib0=.90411d52-9382-407b-be11-7bc1038e89e0@github.com> On Mon, 13 May 2024 11:34:18 GMT, Yudi Zheng wrote: >> This PR removes allocation routines that may throw exception from JVMCIRuntime. It also exports various symbols related to the hashed secondary supers table. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > remove trailing white space Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19176#issuecomment-2112029254 From yzheng at openjdk.org Wed May 15 09:38:05 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 15 May 2024 09:38:05 GMT Subject: Integrated: 8331429: [JVMCI] Cleanup JVMCIRuntime allocation routines In-Reply-To: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> References: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> Message-ID: On Fri, 10 May 2024 13:06:21 GMT, Yudi Zheng wrote: > This PR removes allocation routines that may throw exception from JVMCIRuntime. It also exports various symbols related to the hashed secondary supers table. This pull request has now been integrated. Changeset: 957eb611 Author: Yudi Zheng Committer: Doug Simon URL: https://git.openjdk.org/jdk/commit/957eb611ce2531a3fcc764813ad1e0776887fdda Stats: 116 lines in 4 files changed: 3 ins; 45 del; 68 mod 8331429: [JVMCI] Cleanup JVMCIRuntime allocation routines Reviewed-by: dlong, dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/19176 From eastigeevich at openjdk.org Wed May 15 09:59:12 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 15 May 2024 09:59:12 GMT Subject: Integrated: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 13:03:26 GMT, Evgeny Astigeevich wrote: > Backout of [JDK-8309271](https://bugs.openjdk.org/browse/JDK-8309271) which has known bugs, possible bugs and performance issues. REDO work is tracked by [JDK-8331749](https://bugs.openjdk.org/browse/JDK-8331749). > > Found bugs: > - When refreshing `CompilerDirectivesAddDCmd::execute` will call `DirectivesStack::hasMatchingDirectives(mh, true)` which only considers the compiler directive which is on the top of the directives stack. As more than one directive can be added, `CompilerDirectivesAddDCmd::execute` will not behave as expected. > - A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. > > There are other concerns: bugs and performance issues. > > Possible bugs: > - `has_matching_directives` might not be cleared. A nmethod might get into the unloading state before `CodeCache::recompile_marked_directives_matches`. If the nmethod has been used to mark a Java method and it is the only nmethod, there will be no nmethod in CodeCache to reach the Java method to clear the mark. > - A Java method might have been compiled with new directives before `CodeCache::recompile_marked_directives_matches`. `CodeCache::recompile_marked_directives_matches` will recompile it again. > - JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. > > Performance issues: > - Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, `CodeCache::recompile_marked_directives_matches` will be traversing nmethods most of which don't need recompilation. > > The backout is not clean because of removal of `CompiledMethod`. > > Tested with release and fastdebug builds: tier1 and tier2 passed. This pull request has now been integrated. Changeset: 1a944478 Author: Evgeny Astigeevich URL: https://git.openjdk.org/jdk/commit/1a944478a26a766f5a573a1236b642d8e7b0685c Stats: 380 lines in 15 files changed: 3 ins; 347 del; 30 mod 8332111: [BACKOUT] A way to align already compiled methods with compiler directives Reviewed-by: shade, kvn ------------- PR: https://git.openjdk.org/jdk/pull/19215 From eastigeevich at openjdk.org Wed May 15 09:59:11 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 15 May 2024 09:59:11 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 22:43:44 GMT, Vladimir Kozlov wrote: >> What if instead of backing out we will use an experimental JVM flag: `XX:+CompilerDirectivesRefreshSupport`? > >> What if instead of backing out we will use an experimental JVM flag: `XX:+CompilerDirectivesRefreshSupport`? > > I don't think this is correct way to fix the bug. Thank you, @vnkozlov @dchuyko @shipilev ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2112072984 From fjiang at openjdk.org Wed May 15 10:21:14 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 15 May 2024 10:21:14 GMT Subject: RFR: 8331281: RISC-V: C2: Support vector-scalar and vector-immediate bitwise logic instructions [v2] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 08:14:31 GMT, Gui Cao wrote: >> Hi, We want to support vector-scalar and vector-immediate bitwise logic instructions, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot. >> We can use the Int256VectorTests.java[2] to print the compilation log, verify and observe the generation of nodes. >> >> For example, we can use the following command to print the compilation log of a jtreg test case: >> >> >> /home/zifeihan/jdk-tools/jtreg/bin/jtreg \ >> -v:default \ >> -concurrency:16 -timeout:50 \ >> -javaoption:-XX:+UnlockExperimentalVMOptions \ >> -javaoption:-XX:+UseRVV \ >> -javaoption:-XX:+PrintOptoAssembly \ >> -javaoption:-XX:LogFile=/home/zifeihan/jdk/Int256VectorTests_PrintOptoAssembly.log \ >> -jdk:/home/zifeihan/jdk/build/linux-riscv64-server-fastdebug/jdk \ >> /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/Int256VectorTests.java >> >> >> >> we can observe the specified compilation log `Int256VectorTests_PrintOptoAssembly.log`, which contains the vector-scalar and vector-immediate bitwise logic node for the PR implementation. >> >> vand_immI Node >> >> >> 0b4 vloadcon V3 # generate iota indices >> 0bc vmla V2, V2, V3, V1 >> 0c4 vand_immI V2, V2, #7 >> 0cc addi R7, R30, #16 # ptr, #@addP_reg_imm >> 0d0 storeV [R7], V2 # vector (rvv) >> >> >> vor_regI Node >> >> >> 180 vor_regI V1, V1, R30 >> 188 add R31, R14, R31 # ptr, #@addP_reg_reg >> 18a addi R31, R31, #16 # ptr, #@addP_reg_imm >> 18c storeV [R31], V1 # vector (rvv) >> 194 addiw R11, R11, #8 #@addI_reg_imm >> 196 blt R11, R13, B17 #@cmpI_loop P=0.500000 C=30564.000000 >> >> >> vxor_regI Node >> >> 198 vxor_regI V1, V1, R30 >> 1a0 add R14, R16, R14 # ptr, #@addP_reg_reg >> 1a2 addi R14, R14, #16 # ptr, #@addP_reg_imm >> 1a4 storeV [R14], V1 # vector (rvv) >> 1ac addiw R11, R11, #8 #@addI_reg_imm >> 1ae blt R11, R13, B21 #@cmpI_loop P=0.500000 C=30564.000000 >> >> >> vand_regI_masked Node >> >> 234 B31: # out( B40 B32 ) <- in( B30 ) Freq: 78.5481 >> 234 loadV V2, [R15] # vector (rvv) >> 23c vand_regI_masked V2, V2, R11 >> 244 storeV [R9], V2 # vector (rvv) >> 24c mv R10, #8 # int, #@loadConI >> 24e ble R7, R10, B40 #@cmpI_branch P=0.000001 C=-1.000000 >> >> >> vor_regI_masked Node >> >> 1ee B32: # out( B38 B33 ) <- in( B31 ) Freq: 75.8475 >> 1ee loadV V1, [R11] # vector (rvv) >> 1f6 vor_regI_masked V1, V1, R31 >> 1fe addi R11, R13, #32 # ptr, #@addP_reg_imm >> 202 bgeu R29, R10, B38 #@cmpU_bra... > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Use iRegIorL2I to replace iRegI in AndV/OrVXorV instruct Looks good, thanks! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/18999#pullrequestreview-2057551278 From alanb at openjdk.org Wed May 15 10:47:13 2024 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 15 May 2024 10:47:13 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v4] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Wed, 15 May 2024 08:49:46 GMT, ExE Boss wrote: > Maybe?export this?interface to?`jdk.unsupported`? I don't we should do that. In general, we need jdk.unsupported to go away in the long term. Also integrity of the platform depends on java.base being very stingy and not exporting internal packages to other modules. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1601399452 From duke at openjdk.org Wed May 15 11:31:10 2024 From: duke at openjdk.org (ExE Boss) Date: Wed, 15 May 2024 11:31:10 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v4] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Wed, 15 May 2024 10:43:52 GMT, Alan Bateman wrote: >> src/java.base/share/classes/jdk/internal/lang/stable/TrustedFieldType.java line 14: >> >>> 12: * operations. >>> 13: */ >>> 14: public sealed interface TrustedFieldType >> >> Maybe?export this?interface to?`jdk.unsupported`? > >> Maybe?export this?interface to?`jdk.unsupported`? > > I don't we should do that. In general, we need jdk.unsupported to go away in the long term. Also integrity of the platform depends on java.base being very stingy and not exporting internal packages to other modules. Given that `TrustedFieldType` is?more?generic than?stable?values, it?could be?moved to?`jdk.internal.misc` or?`jdk.internal.reflect`, then?`jdk.unsupported` could?use?it without?exporting new?packages to?`jdk.unsupported`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1601455889 From roland at openjdk.org Wed May 15 12:04:11 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 15 May 2024 12:04:11 GMT Subject: RFR: 8332245: C2: missing record_for_ign() call in GraphKit::must_be_not_null() [v2] In-Reply-To: References: Message-ID: On Tue, 14 May 2024 20:22:22 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed test @summary > > Looks good to me otherwise. @TobiHartmann @chhagedorn thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/19233#issuecomment-2112340441 From roland at openjdk.org Wed May 15 12:04:12 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 15 May 2024 12:04:12 GMT Subject: Integrated: 8332245: C2: missing record_for_ign() call in GraphKit::must_be_not_null() In-Reply-To: References: Message-ID: On Tue, 14 May 2024 16:17:48 GMT, Roland Westrelin wrote: > The `If` node that's created by `GraphKit::must_be_not_null()` is not > enqueued for igvn when it's created. The test case shows it prevents 2 > identical tests from commoning when the first igvn executes. This is a > minor issue I noticed whule working on something else. This pull request has now been integrated. Changeset: 8032d640 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/8032d640c0d34fe507392a1d4faa4ff2005c771d Stats: 77 lines in 2 files changed: 77 ins; 0 del; 0 mod 8332245: C2: missing record_for_ign() call in GraphKit::must_be_not_null() Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/19233 From roland at openjdk.org Wed May 15 12:07:22 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 15 May 2024 12:07:22 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v5] In-Reply-To: References: Message-ID: > Range check `CastII` nodes are removed once loop opts are over. The > test case for this change includes 3 cases where elimination of a > range check `CastII` causes a crash in compiled code because either a > out of bounds array load or a division by zero happen. > > In `test1`: > > - the range checks for the `array[otherArray.length]` loads constant > fold: `otherArray.length` is a `CastII` of i at the `otherArray` > allocation. `i` is less than 9. The `CastII` at the allocation > narrows the type down further to `[0-9]`. > > - the `array[otherArray.length]` loads are control dependent on the > unrelated: > > > if (flag == 0) { > > > test. There's an identical dominating test which replaces that one. As > a consequence, the `array[otherArray.length]` loads become control > dependent on the dominating test. > > - The `CastII` nodes at the `otherArray` allocations are replaced by a > dominating range check `CastII` nodes for: > > > newArray[i] = 42; > > > - After loop opts, the range check `CastII` nodes are removed and the > 2 `array[otherArray.length]` loads common at the first: > > > if (flag == 0) { > > > test before the: > > > float[] otherArray = new float[i]; > > > and > > > newArray[i] = 42; > > > that guarantee `i` is positive. > > - `test1` is called with `i = -1`, the array load proceeds with an out > of bounds index and the crash occurs. > > > `test2` and `test3` are mostly identical except for the check that's > eliminated (a null divisor check) and the instruction that causes a > fault (an integer division). > > The fix I propose is to not eliminate range check `CastII` nodes after > loop opts. When range check`CastII` nodes were introduced, performance > was observed to regress. Removing them after loop opts was found to > preserve both correctness and performance. Today, the performance > regression still exists when `CastII` nodes are left in. So I propose > we keep them until the end of optimizations (so the 2 array loads > above don't lose a dependency and wrongly common) but remove them at > the end of all optimizations. > > In the case of the array loads, they are dependent on a range check > for another array through a range check `CastII` and we must not lose > that dependency otherwise the array loads could float above the range > check at gcm time. I propose we deal with that problem the way it's > handled for `CastPP` nodes: add the dependency to the load (or > division)nodes as a precedence edge when the cast is removed. > > @TobiHartmann ran performance testing for that patch (Thanks!) and reported > no regression. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: - Node::is_div_or_mod() - Merge branch 'master' into JDK-8324517 - test fix - review - Merge branch 'master' into JDK-8324517 - Merge branch 'master' into JDK-8324517 - review - Merge branch 'master' into JDK-8324517 - test and fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18377/files - new: https://git.openjdk.org/jdk/pull/18377/files/5cc658b6..67d2a05a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18377&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18377&range=03-04 Stats: 26538 lines in 631 files changed: 14194 ins; 7687 del; 4657 mod Patch: https://git.openjdk.org/jdk/pull/18377.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18377/head:pull/18377 PR: https://git.openjdk.org/jdk/pull/18377 From roland at openjdk.org Wed May 15 12:07:22 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 15 May 2024 12:07:22 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v5] In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 13:11:38 GMT, Emanuel Peter wrote: >> @eme64 did you get a chance to look at the answers to your questions? > > @rwestrel It seems I only get notifications for new messages, not responses. Looking at the PR now... What about the new commit with the Node::is_div_or_mod() method @eme64 ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18377#issuecomment-2112345169 From epeter at openjdk.org Wed May 15 12:12:08 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 15 May 2024 12:12:08 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v5] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 12:07:22 GMT, Roland Westrelin wrote: >> Range check `CastII` nodes are removed once loop opts are over. The >> test case for this change includes 3 cases where elimination of a >> range check `CastII` causes a crash in compiled code because either a >> out of bounds array load or a division by zero happen. >> >> In `test1`: >> >> - the range checks for the `array[otherArray.length]` loads constant >> fold: `otherArray.length` is a `CastII` of i at the `otherArray` >> allocation. `i` is less than 9. The `CastII` at the allocation >> narrows the type down further to `[0-9]`. >> >> - the `array[otherArray.length]` loads are control dependent on the >> unrelated: >> >> >> if (flag == 0) { >> >> >> test. There's an identical dominating test which replaces that one. As >> a consequence, the `array[otherArray.length]` loads become control >> dependent on the dominating test. >> >> - The `CastII` nodes at the `otherArray` allocations are replaced by a >> dominating range check `CastII` nodes for: >> >> >> newArray[i] = 42; >> >> >> - After loop opts, the range check `CastII` nodes are removed and the >> 2 `array[otherArray.length]` loads common at the first: >> >> >> if (flag == 0) { >> >> >> test before the: >> >> >> float[] otherArray = new float[i]; >> >> >> and >> >> >> newArray[i] = 42; >> >> >> that guarantee `i` is positive. >> >> - `test1` is called with `i = -1`, the array load proceeds with an out >> of bounds index and the crash occurs. >> >> >> `test2` and `test3` are mostly identical except for the check that's >> eliminated (a null divisor check) and the instruction that causes a >> fault (an integer division). >> >> The fix I propose is to not eliminate range check `CastII` nodes after >> loop opts. When range check`CastII` nodes were introduced, performance >> was observed to regress. Removing them after loop opts was found to >> preserve both correctness and performance. Today, the performance >> regression still exists when `CastII` nodes are left in. So I propose >> we keep them until the end of optimizations (so the 2 array loads >> above don't lose a dependency and wrongly common) but remove them at >> the end of all optimizations. >> >> In the case of the array loads, they are dependent on a range check >> for another array through a range check `CastII` and we must not lose >> that dependency otherwise the array loads could float above the range >> check at gcm time. I propose we deal with that problem the way it's >> handled for `CastPP` nodes: add the dependency to the load (or >> division)nodes ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: > > - Node::is_div_or_mod() > - Merge branch 'master' into JDK-8324517 > - test fix > - review > - Merge branch 'master' into JDK-8324517 > - Merge branch 'master' into JDK-8324517 > - review > - Merge branch 'master' into JDK-8324517 > - test and fix @rwestrel thanks for the update, looks good! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18377#pullrequestreview-2057814727 From forax at openjdk.org Wed May 15 12:29:18 2024 From: forax at openjdk.org (=?UTF-8?B?UsOpbWk=?= Forax) Date: Wed, 15 May 2024 12:29:18 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v4] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Wed, 15 May 2024 11:27:04 GMT, ExE Boss wrote: >>> Maybe?export this?interface to?`jdk.unsupported`? >> >> I don't we should do that. In general, we need jdk.unsupported to go away in the long term. Also integrity of the platform depends on java.base being very stingy and not exporting internal packages to other modules. > > Given that `TrustedFieldType` is?more?generic than?stable?values, it?could be?moved to?`jdk.internal.misc` or?`jdk.internal.reflect`, then?`jdk.unsupported` could?use?it without?exporting new?packages to?`jdk.unsupported`. At some point in the future, 'jdk.unsupported' will be removed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1601548492 From roland at openjdk.org Wed May 15 12:45:15 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 15 May 2024 12:45:15 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v4] In-Reply-To: References: Message-ID: <8DnNVDn7mCzfbzXu1Q0GaDsb3OxzAQF1hiSX5RqDAcI=.ed4ba7b5-e418-40e9-b61d-5bdb4513484d@github.com> On Tue, 14 May 2024 20:41:11 GMT, Tobias Hartmann wrote: > > I did but was fairly conservative. In the case of PhaseIdealLoop::match_fill_loop, I don't think this change makes a difference: if we don't need the check for CastIINode::has_range_check there then it's true with or without that change. > > Right, maybe we can put that into the follow-up bug. Should there be another follow up bug then? Or did I not understand what the follow up bug was about? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18377#issuecomment-2112419786 From epeter at openjdk.org Wed May 15 13:19:12 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 15 May 2024 13:19:12 GMT Subject: RFR: 8329273: C2 SuperWord: Some basic MemorySegment IR tests [v2] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 17:15:21 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 25 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8329273-memory-segment-ir-tests >> - fix tabs >> - speed up test >> - small cosmetic fix >> - make things static >> - long loop tests >> - handle AlignVector >> - int cases >> - int-index case >> - disable mixed tests >> - ... and 15 more: https://git.openjdk.org/jdk/compare/87e303f0...6f760dfd > > Looks good. Thanks for the reviews @vnkozlov @chhagedorn ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18535#issuecomment-2112489598 From epeter at openjdk.org Wed May 15 13:19:13 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 15 May 2024 13:19:13 GMT Subject: Integrated: 8329273: C2 SuperWord: Some basic MemorySegment IR tests In-Reply-To: References: Message-ID: On Thu, 28 Mar 2024 16:34:38 GMT, Emanuel Peter wrote: > I could not find any IR vectorization tests for `MemorySegment` yet. > > I make sure to exercise different backing types: > - arrays > - buffers > - native memory > > I filed a follow-up RFE, to eventually make all cases where I have "FAILS" vectorize: > > [JDK-8331659](https://bugs.openjdk.org/browse/JDK-8331659): C2 SuperWord: investicate failed vectorization in compiler/loopopts/superword/TestMemorySegment.java This pull request has now been integrated. Changeset: c4867c62 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/c4867c62c44b48e48845608fe4b29b58749767ad Stats: 810 lines in 1 file changed: 810 ins; 0 del; 0 mod 8329273: C2 SuperWord: Some basic MemorySegment IR tests Reviewed-by: kvn, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/18535 From kvn at openjdk.org Wed May 15 13:25:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 15 May 2024 13:25:13 GMT Subject: RFR: JDK-8330565 : C2: Multiple crashes with CTW after JDK-8316991 [v2] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 04:11:36 GMT, Cesar Soares Lucas wrote: >> The `# assert(false) failed: Bad graph detected in build_loop_late` failure was caused because a string concatenation optimization using [this method](https://github.com/openjdk/jdk/blob/819f3d6fc70ff6fe54ac5f9033c17c3dd4326aa5/src/hotspot/share/opto/graphKit.cpp#L4115) adds AddP and LoadN nodes to IR graph as NotNull _and_ because RAM was not "nullyfing" phis merging nullable pointers. I was only able to reproduce this problem using a classfile/jar compiled using an "old" version of JDK.. because newer version use InvokeDynamic to do string concatenation. >> >> Tested with JTREG tier1-4 on Linux x86_64 & ARM64. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Refactor split_castpp_load_through_phi Looks good. This have to be tested. ------------- PR Review: https://git.openjdk.org/jdk/pull/19147#pullrequestreview-2058008370 From epeter at openjdk.org Wed May 15 13:34:04 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 15 May 2024 13:34:04 GMT Subject: RFR: 8325155: C2 SuperWord: remove alignment boundaries [v2] In-Reply-To: References: Message-ID: > I have tried for a very long time to get rid of all the `alignment(n)` code that is all over the SuperWord code. With lots of previous work, I am now finally ready to remove it. > > I was able to remove lots of VM code, about 300 lines. And the removed code is I think much more complicated than the new code. > > This is what I did in this PR: > - Removal of `_node_info`: used to have many fields, which I refactored out to the `VLoopAnalyzer` modules. `alignment` is the last component, which I now remove. > - Changed the implementation of `SuperWord::find_adjacent_refs`, now `SuperWord::find_adjacent_memop_pairs`, completely: > - It used to be an algorithm that would scan over all `memops` repeatedly, try to find some `mem_ref` and see which other memops were comparable, and then pack pairs for all of those, by comparing all-vs-all memops. This algorithm is at least quadratic, if not much worse. > - I now add all `memops` into a single array, sort them by groups (those that are comparable with each other and could be packed into vectors), and inside the groups by ascending offset. This allows me to split off the groups much more efficiently, and also the sorting by offset allows me finding adjacent pairs much more efficiently. In the most cases this reduces the cost to `O(n log n)` for sort, and a linear scan for finding adjacent memops. > - I removed the "alignment boundaries" created in `SuperWord::memory_alignment` by `int off_rem = offset % vw;`. > - This used to have the effect that all offsets were computed modulo the vector width. Hence, pairs could not be packed across this boundary (e.g. we have nodes with offsets `31, 32`, which are adjacent in theory, but if we have a `vw = 32`, then the modulo-offsets are `31, 0`, and they are not detected as adjacent). > - These "alignment boundaries" used to be required for correctness about a year ago, before I fixed and relaxed much of the alignment code. > - The `alignment` used to have another important task: Ensuring compatibility of the input-size of a use node, with the output-size of the def-node. > - This was done by giving all nodes an `alignment`, even the non-memop nodes. This `alignment` was then scaled up and down at type casts (e.g. int `0, 4, 8, 12` -> long `0, 8, 16, 24`). If the output-size of the def-node did not match the input-size of the use-node, then the `alignment` would not match up, and we would not pack. > - This is why we used to have checks like `alignment(s1) + data_size(s1) == alignment(s2)` ... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: - Merge branch 'master' into JDK-8325155-rm-alignment-boundaries - rm TODO - manual merge - revert a line, need to fix it different - improve comments - fix alignment - fix reductions - MaxI reduction over chars - Merge branch 'master' into JDK-8325155-rm-alignment-boundaries - Merge branch 'master' into JDK-8325155-rm-alignment-boundaries - ... and 15 more: https://git.openjdk.org/jdk/compare/c4867c62...82c9a77a ------------- Changes: https://git.openjdk.org/jdk/pull/18822/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18822&range=01 Stats: 1064 lines in 7 files changed: 597 ins; 369 del; 98 mod Patch: https://git.openjdk.org/jdk/pull/18822.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18822/head:pull/18822 PR: https://git.openjdk.org/jdk/pull/18822 From vlivanov at openjdk.org Wed May 15 13:57:35 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 15 May 2024 13:57:35 GMT Subject: RFR: 8331885: C2: meet between unloaded and speculative types is not symmetric Message-ID: `TypeInstPtr::xmeet_unloaded` computes the MEET of two InstPtrs when at least one is unloaded, but doesn't preserve speculative part if one is present. It causes the corresponding assert to fail. Proposed fix unconditionally keeps speculative part. Testing: hs-tier1 - hs-tier4 ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/19249/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19249&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331885 Stats: 20 lines in 3 files changed: 12 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/19249.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19249/head:pull/19249 PR: https://git.openjdk.org/jdk/pull/19249 From rrich at openjdk.org Wed May 15 14:12:17 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 15 May 2024 14:12:17 GMT Subject: RFR: 8331311: C2: Big Endian Port of 8318446: optimize stores into primitive arrays by combining values into larger store [v3] In-Reply-To: References: Message-ID: > This pr adds a few tweaks to [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446) which allows enabling it also on big endian platforms (e.g. AIX, S390). JDK-8318446 introduced a C2 optimization to replace consecutive stores to a primitive array with just one store. > > By example (from `TestMergeStores.java`): > > > static Object[] test2a(byte[] a, int offset, long v) { > if (IS_BIG_ENDIAN) { > a[offset + 0] = (byte)(v >> 56); > a[offset + 1] = (byte)(v >> 48); > a[offset + 2] = (byte)(v >> 40); > a[offset + 3] = (byte)(v >> 32); > a[offset + 4] = (byte)(v >> 24); > a[offset + 5] = (byte)(v >> 16); > a[offset + 6] = (byte)(v >> 8); > a[offset + 7] = (byte)(v >> 0); > } else { > a[offset + 0] = (byte)(v >> 0); > a[offset + 1] = (byte)(v >> 8); > a[offset + 2] = (byte)(v >> 16); > a[offset + 3] = (byte)(v >> 24); > a[offset + 4] = (byte)(v >> 32); > a[offset + 5] = (byte)(v >> 40); > a[offset + 6] = (byte)(v >> 48); > a[offset + 7] = (byte)(v >> 56); > } > return new Object[]{ a }; > } > > > Depending on the endianess 8 bytes are stored into an array. The order of the stores is the same as the order of an 8-byte-store therefore 8 1-byte-stores can be replaced with just one 8-byte-store (if there aren't too many range checks). > > Additionally I've fixed a few comments and a test bug. > > The optimization seems to be a little bit more effective on big endian platforms. > > Again by example: > > > static Object[] test800a(byte[] a, int offset, long v) { > if (IS_BIG_ENDIAN) { > a[offset + 0] = (byte)(v >> 40); // Removed from candidate list > a[offset + 1] = (byte)(v >> 32); // Removed from candidate list > a[offset + 2] = (byte)(v >> 24); // Merged > a[offset + 3] = (byte)(v >> 16); // Merged > a[offset + 4] = (byte)(v >> 8); // Merged > a[offset + 5] = (byte)(v >> 0); // Merged > } else { > a[offset + 0] = (byte)(v >> 0); // Removed from candidate list > a[offset + 1] = (byte)(v >> 8); // Removed from candidate list > a[offset + 2] = (byte)(v >> 16); // Not merged > a[offset + 3] = (byte)(v >> 24); // Not merged > a[offset + 4] = (byte)(v >> 32); // Not merged > a[offset + 5] = (byte)(v >> 40); // Not merged > } > return new Object[]{ a };... Richard Reingruber has updated the pull request incrementally with two additional commits since the last revision: - test2BE: big endian version of test2 - Improve make_merged_input_value based on Emanuel's feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19218/files - new: https://git.openjdk.org/jdk/pull/19218/files/dc05bb0b..8844c837 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19218&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19218&range=01-02 Stats: 172 lines in 2 files changed: 98 ins; 34 del; 40 mod Patch: https://git.openjdk.org/jdk/pull/19218.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19218/head:pull/19218 PR: https://git.openjdk.org/jdk/pull/19218 From rrich at openjdk.org Wed May 15 14:12:17 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 15 May 2024 14:12:17 GMT Subject: RFR: 8331311: C2: Big Endian Port of 8318446: optimize stores into primitive arrays by combining values into larger store [v3] In-Reply-To: <5Zx0TXxEyR05tg6kNhs_3tHKsTm9xFGYmawen_m4fb4=.991d9155-91ac-4534-b655-655adc5e2b51@github.com> References: <5Zx0TXxEyR05tg6kNhs_3tHKsTm9xFGYmawen_m4fb4=.991d9155-91ac-4534-b655-655adc5e2b51@github.com> Message-ID: On Tue, 14 May 2024 16:15:29 GMT, Emanuel Peter wrote: >> Richard Reingruber has updated the pull request incrementally with two additional commits since the last revision: >> >> - test2BE: big endian version of test2 >> - Improve make_merged_input_value based on Emanuel's feedback > > src/hotspot/share/opto/memnode.cpp line 3313: > >> 3311: merged_input_value = _store->in(MemNode::ValueIn); >> 3312: bool is_true = is_con_RShift(first->in(MemNode::ValueIn), base_last, shift_last); >> 3313: #endif // VM_LITTLE_ENDIAN > > You could just have local variables for "lo" / "hi", set them depending on big/little endian, and then the logic would be the same for both. Yeah, that's better. I've done that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19218#discussion_r1601726132 From rrich at openjdk.org Wed May 15 14:12:17 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 15 May 2024 14:12:17 GMT Subject: RFR: 8331311: C2: Big Endian Port of 8318446: optimize stores into primitive arrays by combining values into larger store [v2] In-Reply-To: References: Message-ID: <7albiealYdD-4xl9yk1jr2eeqgu0KMocptpdT78ICRg=.7a14f84f-1c2d-4062-9561-d3da8101ae55@github.com> On Wed, 15 May 2024 07:53:33 GMT, Richard Reingruber wrote: >> This pr adds a few tweaks to [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446) which allows enabling it also on big endian platforms (e.g. AIX, S390). JDK-8318446 introduced a C2 optimization to replace consecutive stores to a primitive array with just one store. >> >> By example (from `TestMergeStores.java`): >> >> >> static Object[] test2a(byte[] a, int offset, long v) { >> if (IS_BIG_ENDIAN) { >> a[offset + 0] = (byte)(v >> 56); >> a[offset + 1] = (byte)(v >> 48); >> a[offset + 2] = (byte)(v >> 40); >> a[offset + 3] = (byte)(v >> 32); >> a[offset + 4] = (byte)(v >> 24); >> a[offset + 5] = (byte)(v >> 16); >> a[offset + 6] = (byte)(v >> 8); >> a[offset + 7] = (byte)(v >> 0); >> } else { >> a[offset + 0] = (byte)(v >> 0); >> a[offset + 1] = (byte)(v >> 8); >> a[offset + 2] = (byte)(v >> 16); >> a[offset + 3] = (byte)(v >> 24); >> a[offset + 4] = (byte)(v >> 32); >> a[offset + 5] = (byte)(v >> 40); >> a[offset + 6] = (byte)(v >> 48); >> a[offset + 7] = (byte)(v >> 56); >> } >> return new Object[]{ a }; >> } >> >> >> Depending on the endianess 8 bytes are stored into an array. The order of the stores is the same as the order of an 8-byte-store therefore 8 1-byte-stores can be replaced with just one 8-byte-store (if there aren't too many range checks). >> >> Additionally I've fixed a few comments and a test bug. >> >> The optimization seems to be a little bit more effective on big endian platforms. >> >> Again by example: >> >> >> static Object[] test800a(byte[] a, int offset, long v) { >> if (IS_BIG_ENDIAN) { >> a[offset + 0] = (byte)(v >> 40); // Removed from candidate list >> a[offset + 1] = (byte)(v >> 32); // Removed from candidate list >> a[offset + 2] = (byte)(v >> 24); // Merged >> a[offset + 3] = (byte)(v >> 16); // Merged >> a[offset + 4] = (byte)(v >> 8); // Merged >> a[offset + 5] = (byte)(v >> 0); // Merged >> } else { >> a[offset + 0] = (byte)(v >> 0); // Removed from candidate list >> a[offset + 1] = (byte)(v >> 8); // Removed from candidate list >> a[offset + 2] = (byte)(v >> 16); // Not merged >> a[offset + 3] = (byte)(v >> 24); // Not merged >> a[offset + 4] = (byte)(v >> 32); // Not merge... > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Improve comment Thanks for looking at the pr. > Just did a quick scan of the tests. I think it could be good to have both big/small endian tests run on both big/small endian machines, but only expect IR rules to pass if the test and platform are expected to optimize. This just makes sure that the logic is correct, and does not optimize the wrong cases, producing wrong results. I've done that for `test2` and introduced `test2BE`. Is that want you mean? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19218#issuecomment-2112655406 From fyang at openjdk.org Wed May 15 14:28:09 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 15 May 2024 14:28:09 GMT Subject: RFR: 8331281: RISC-V: C2: Support vector-scalar and vector-immediate bitwise logic instructions [v2] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 08:14:31 GMT, Gui Cao wrote: >> Hi, We want to support vector-scalar and vector-immediate bitwise logic instructions, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot. >> We can use the Int256VectorTests.java[2] to print the compilation log, verify and observe the generation of nodes. >> >> For example, we can use the following command to print the compilation log of a jtreg test case: >> >> >> /home/zifeihan/jdk-tools/jtreg/bin/jtreg \ >> -v:default \ >> -concurrency:16 -timeout:50 \ >> -javaoption:-XX:+UnlockExperimentalVMOptions \ >> -javaoption:-XX:+UseRVV \ >> -javaoption:-XX:+PrintOptoAssembly \ >> -javaoption:-XX:LogFile=/home/zifeihan/jdk/Int256VectorTests_PrintOptoAssembly.log \ >> -jdk:/home/zifeihan/jdk/build/linux-riscv64-server-fastdebug/jdk \ >> /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/Int256VectorTests.java >> >> >> >> we can observe the specified compilation log `Int256VectorTests_PrintOptoAssembly.log`, which contains the vector-scalar and vector-immediate bitwise logic node for the PR implementation. >> >> vand_immI Node >> >> >> 0b4 vloadcon V3 # generate iota indices >> 0bc vmla V2, V2, V3, V1 >> 0c4 vand_immI V2, V2, #7 >> 0cc addi R7, R30, #16 # ptr, #@addP_reg_imm >> 0d0 storeV [R7], V2 # vector (rvv) >> >> >> vor_regI Node >> >> >> 180 vor_regI V1, V1, R30 >> 188 add R31, R14, R31 # ptr, #@addP_reg_reg >> 18a addi R31, R31, #16 # ptr, #@addP_reg_imm >> 18c storeV [R31], V1 # vector (rvv) >> 194 addiw R11, R11, #8 #@addI_reg_imm >> 196 blt R11, R13, B17 #@cmpI_loop P=0.500000 C=30564.000000 >> >> >> vxor_regI Node >> >> 198 vxor_regI V1, V1, R30 >> 1a0 add R14, R16, R14 # ptr, #@addP_reg_reg >> 1a2 addi R14, R14, #16 # ptr, #@addP_reg_imm >> 1a4 storeV [R14], V1 # vector (rvv) >> 1ac addiw R11, R11, #8 #@addI_reg_imm >> 1ae blt R11, R13, B21 #@cmpI_loop P=0.500000 C=30564.000000 >> >> >> vand_regI_masked Node >> >> 234 B31: # out( B40 B32 ) <- in( B30 ) Freq: 78.5481 >> 234 loadV V2, [R15] # vector (rvv) >> 23c vand_regI_masked V2, V2, R11 >> 244 storeV [R9], V2 # vector (rvv) >> 24c mv R10, #8 # int, #@loadConI >> 24e ble R7, R10, B40 #@cmpI_branch P=0.000001 C=-1.000000 >> >> >> vor_regI_masked Node >> >> 1ee B32: # out( B38 B33 ) <- in( B31 ) Freq: 75.8475 >> 1ee loadV V1, [R11] # vector (rvv) >> 1f6 vor_regI_masked V1, V1, R31 >> 1fe addi R11, R13, #32 # ptr, #@addP_reg_imm >> 202 bgeu R29, R10, B38 #@cmpU_bra... > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Use iRegIorL2I to replace iRegI in AndV/OrVXorV instruct LGTM. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18999#pullrequestreview-2058193942 From gcao at openjdk.org Wed May 15 14:46:15 2024 From: gcao at openjdk.org (Gui Cao) Date: Wed, 15 May 2024 14:46:15 GMT Subject: RFR: 8331281: RISC-V: C2: Support vector-scalar and vector-immediate bitwise logic instructions [v3] In-Reply-To: References: Message-ID: <0q6pGSepZDIXlh533WsFLAK4T2CkjwyQkzLdzbN8nYg=.2c0c8457-7f40-49fc-860f-bc90834ddca4@github.com> > Hi, We want to support vector-scalar and vector-immediate bitwise logic instructions, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot. > We can use the Int256VectorTests.java[2] to print the compilation log, verify and observe the generation of nodes. > > For example, we can use the following command to print the compilation log of a jtreg test case: > > > /home/zifeihan/jdk-tools/jtreg/bin/jtreg \ > -v:default \ > -concurrency:16 -timeout:50 \ > -javaoption:-XX:+UnlockExperimentalVMOptions \ > -javaoption:-XX:+UseRVV \ > -javaoption:-XX:+PrintOptoAssembly \ > -javaoption:-XX:LogFile=/home/zifeihan/jdk/Int256VectorTests_PrintOptoAssembly.log \ > -jdk:/home/zifeihan/jdk/build/linux-riscv64-server-fastdebug/jdk \ > /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/Int256VectorTests.java > > > > we can observe the specified compilation log `Int256VectorTests_PrintOptoAssembly.log`, which contains the vector-scalar and vector-immediate bitwise logic node for the PR implementation. > > vand_immI Node > > > 0b4 vloadcon V3 # generate iota indices > 0bc vmla V2, V2, V3, V1 > 0c4 vand_immI V2, V2, #7 > 0cc addi R7, R30, #16 # ptr, #@addP_reg_imm > 0d0 storeV [R7], V2 # vector (rvv) > > > vor_regI Node > > > 180 vor_regI V1, V1, R30 > 188 add R31, R14, R31 # ptr, #@addP_reg_reg > 18a addi R31, R31, #16 # ptr, #@addP_reg_imm > 18c storeV [R31], V1 # vector (rvv) > 194 addiw R11, R11, #8 #@addI_reg_imm > 196 blt R11, R13, B17 #@cmpI_loop P=0.500000 C=30564.000000 > > > vxor_regI Node > > 198 vxor_regI V1, V1, R30 > 1a0 add R14, R16, R14 # ptr, #@addP_reg_reg > 1a2 addi R14, R14, #16 # ptr, #@addP_reg_imm > 1a4 storeV [R14], V1 # vector (rvv) > 1ac addiw R11, R11, #8 #@addI_reg_imm > 1ae blt R11, R13, B21 #@cmpI_loop P=0.500000 C=30564.000000 > > > vand_regI_masked Node > > 234 B31: # out( B40 B32 ) <- in( B30 ) Freq: 78.5481 > 234 loadV V2, [R15] # vector (rvv) > 23c vand_regI_masked V2, V2, R11 > 244 storeV [R9], V2 # vector (rvv) > 24c mv R10, #8 # int, #@loadConI > 24e ble R7, R10, B40 #@cmpI_branch P=0.000001 C=-1.000000 > > > vor_regI_masked Node > > 1ee B32: # out( B38 B33 ) <- in( B31 ) Freq: 75.8475 > 1ee loadV V1, [R11] # vector (rvv) > 1f6 vor_regI_masked V1, V1, R31 > 1fe addi R11, R13, #32 # ptr, #@addP_reg_imm > 202 bgeu R29, R10, B38 #@cmpU_branch P=0.000001 C=-1.000000 > > vxor_regI_masked Node > > 1ee B32: # out( B38 B33 ) <- in( B31 ) Freq: 75.8475 > 1ee loadV V1, [R11]... Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into JDK-8331281 - Use iRegIorL2I to replace iRegI in AndV/OrVXorV instruct - Polishing Code comment - Add vand/vor/vxor predicated Node - Polishing Code Comment - 8331281: RISC-V: C2: Support vector-scalar and vector-immediate bitwise logic instructions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18999/files - new: https://git.openjdk.org/jdk/pull/18999/files/69c196e7..d83b0b68 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18999&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18999&range=01-02 Stats: 33416 lines in 862 files changed: 18419 ins; 9030 del; 5967 mod Patch: https://git.openjdk.org/jdk/pull/18999.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18999/head:pull/18999 PR: https://git.openjdk.org/jdk/pull/18999 From duke at openjdk.org Wed May 15 15:22:13 2024 From: duke at openjdk.org (duke) Date: Wed, 15 May 2024 15:22:13 GMT Subject: Withdrawn: 8321308: AArch64: Fix matching predication for cbz/cbnz In-Reply-To: References: Message-ID: <5iMqbK6kF3WWvA5slpVsCNnMZkow8E4kYoCfSf6kXMU=.de942644-a793-4b93-bdb6-43ef856a8a0d@github.com> On Wed, 6 Dec 2023 01:54:59 GMT, Fei Gao wrote: > For array length check like: > > if (a.length > 0) { > [Block 1] > } else { > [Block 2] > } > > > Since `a.length` is unsigned, it's semantically equivalent to: > > if (a.length != 0) { > [Block 1] > } else { > [Block 2] > } > > > On aarch64 port, we can do the conversion like above, during c2 compiler instruction matching, for certain unsigned integral comparisons. > > For example, > > cmpw w11, #0 # unsigned > bls label # unsigned > [Block 1] > > label: > [Block 2] > > > can be converted to: > > cbz w11, label > [Block 1] > > label: > [Block 2] > > > Currently, we have some matching rules to do the conversion [[1]](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64.ad#L16179). But the predicate here [[2]](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64.ad#L6140) matches wrong `BoolTest` masks, so these rules fail to convert. I guess it's a typo introduced in [JDK-8160006](https://bugs.openjdk.org/browse/JDK-8160006). The patch fixes it. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/16989 From pminborg at openjdk.org Wed May 15 15:27:34 2024 From: pminborg at openjdk.org (Per Minborg) Date: Wed, 15 May 2024 15:27:34 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v5] In-Reply-To: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: > # Stable Values & Collections (Internal) > > ## Summary > This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. > > ## Goals > * Provide an easy and intuitive API to describe value holders that can change at most once. > * Decouple declaration from initialization without significant footprint or performance penalties. > * Reduce the amount of static initializer and/or field initialization code. > * Uphold integrity and consistency, even in a multi-threaded environment. > > For more details, see the draft JEP: https://openjdk.org/jeps/8312611 > > ## Performance > Performance compared to instance variables using an `AtomicReference` and one protected by double-checked locking under concurrent access by 8 threads: > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.instanceAtomic avgt 10 1.576 ? 0.052 ns/op > StableBenchmark.instanceDCL avgt 10 1.608 ? 0.059 ns/op > StableBenchmark.instanceStable avgt 10 0.979 ? 0.023 ns/op <- StableValue (~40% faster than DCL) > > > Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (8 threads): > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.staticAtomic avgt 10 1.335 ? 0.056 ns/op > StableBenchmark.staticCHI avgt 10 0.623 ? 0.086 ns/op > StableBenchmark.staticDCL avgt 10 1.418 ? 0.171 ns/op > StableBenchmark.staticList avgt 10 0.617 ? 0.024 ns/op > StableBenchmark.staticStable avgt 10 0.604 ? 0.022 ns/op <- StableValue ( > 2x faster than `AtomicInteger` and DCL) > > > Performance for stable lists in both instance and static contexts whereby the sum of random contents is calculated for stable lists (which are thread-safe) compared to `ArrayList` instances (which are not thread-safe) (under single thread access): > > > Benchmark Mode Cnt Score Error Units > StableListSumBenchmark.instanceArrayList avgt 10 0.356 ? 0.005 ns/op > StableListSumBenchmark.instanceList avgt 10 0.373 ? 0.017 ns/op <- Stable list > StableListSumBenchmark.staticArrayList avgt 10 0.352 ? 0.002 ns/op > StableListSumBenchmark.staticList avgt 10 0.356 ? 0.00... Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Switch to monomorphic StableValue and use lazy arrays ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18794/files - new: https://git.openjdk.org/jdk/pull/18794/files/c92b16c4..2b840e06 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=03-04 Stats: 471 lines in 9 files changed: 126 ins; 306 del; 39 mod Patch: https://git.openjdk.org/jdk/pull/18794.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18794/head:pull/18794 PR: https://git.openjdk.org/jdk/pull/18794 From pminborg at openjdk.org Wed May 15 15:27:34 2024 From: pminborg at openjdk.org (Per Minborg) Date: Wed, 15 May 2024 15:27:34 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v4] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Wed, 15 May 2024 08:53:32 GMT, ExE Boss wrote: >> Per Minborg has updated the pull request incrementally with one additional commit since the last revision: >> >> Revise docs for ofBackground() > > src/java.base/share/classes/jdk/internal/lang/StableValue.java line 1: > >> 1: /* > > Maybe also add `StableValue?::?ofLazy?(Supplier)` which?behaves more?like the?original **Computed?Constants** JEP?draft? There is a method `StableValue::asSupplier` that is similar to the former ComputedConstant behavior. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1601840547 From pminborg at openjdk.org Wed May 15 15:27:35 2024 From: pminborg at openjdk.org (Per Minborg) Date: Wed, 15 May 2024 15:27:35 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v4] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Wed, 15 May 2024 12:26:34 GMT, R?mi Forax wrote: >> Given that `TrustedFieldType` is?more?generic than?stable?values, it?could be?moved to?`jdk.internal.misc` or?`jdk.internal.reflect`, then?`jdk.unsupported` could?use?it without?exporting new?packages to?`jdk.unsupported`. > > At some point in the future, 'jdk.unsupported' will be removed Maybe there is a better home for this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1601843499 From pminborg at openjdk.org Wed May 15 15:44:20 2024 From: pminborg at openjdk.org (Per Minborg) Date: Wed, 15 May 2024 15:44:20 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v5] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Wed, 15 May 2024 15:27:34 GMT, Per Minborg wrote: >> # Stable Values & Collections (Internal) >> >> ## Summary >> This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. >> >> ## Goals >> * Provide an easy and intuitive API to describe value holders that can change at most once. >> * Decouple declaration from initialization without significant footprint or performance penalties. >> * Reduce the amount of static initializer and/or field initialization code. >> * Uphold integrity and consistency, even in a multi-threaded environment. >> >> For more details, see the draft JEP: https://openjdk.org/jeps/8312611 >> >> ## Performance >> Performance compared to instance variables using two `AtomicReference` and two protected by double-checked locking under concurrent access by all threads: >> >> >> Benchmark Mode Cnt Score Error Units >> StableBenchmark.atomic thrpt 10 259.478 ? 36.809 ops/us >> StableBenchmark.dcl thrpt 10 225.710 ? 26.638 ops/us >> StableBenchmark.stable thrpt 10 4382.478 ? 1151.472 ops/us <- StableValue significantly faster >> >> >> Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (all threads): >> >> >> Benchmark Mode Cnt Score Error Units >> StableStaticBenchmark.atomic thrpt 10 6487.835 ? 385.639 ops/us >> StableStaticBenchmark.dcl thrpt 10 6605.239 ? 210.610 ops/us >> StableStaticBenchmark.stable thrpt 10 14338.239 ? 1426.874 ops/us >> StableStaticBenchmark.staticCHI thrpt 10 13780.341 ? 1839.651 ops/us >> >> >> Performance for stable lists (thread safe) in both instance and static contexts whereby we access a single value compared to `ArrayList` instances (which are not thread-safe) (all threads): >> >> >> Benchmark Mode Cnt Score Error Units >> StableListElementBenchmark.instanceArrayList thrpt 10 5812.992 ? 1169.730 ops/us >> StableListElementBenchmark.instanceList thrpt 10 4818.643 ? 704.893 ops/us >> StableListElementBenchmark... > > Per Minborg has updated the pull request incrementally with one additional commit since the last revision: > > Switch to monomorphic StableValue and use lazy arrays I have reworked the stable collections so that we create StableValues on demand and store them in a lazily populated backing array. This improved performance significantly as well as gave us improved startup times (only one array needs to be created upfront). Also, StableValue is now monomorphic. On the flip side is the fact that slightly more memory is needed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18794#issuecomment-2112886060 From liach at openjdk.org Wed May 15 15:44:21 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 15 May 2024 15:44:21 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v5] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Tue, 14 May 2024 11:12:39 GMT, Per Minborg wrote: >> src/hotspot/share/ci/ciField.cpp line 262: >> >>> 260: const char* stable_array3d_klass_name = "jdk/internal/lang/StableArray3D"; >>> 261: >>> 262: static bool trust_final_non_static_fields_of_type(Symbol* signature) { >> >> Is there a better way of doing this? > > How do we check if the type implements `TrustedFieldType` in C? Is it possible for us to just look at strict fields from valhalla, so we can reliably constant-fold those strict final fields? https://cr.openjdk.org/~jrose/jls/constructive-classes.html ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1601876058 From liach at openjdk.org Wed May 15 15:52:14 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 15 May 2024 15:52:14 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v4] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: <8QkAH25MWgY2qvCQgPPID71ye9gL3cX5MqhHs1Fapy0=.293fae67-c17f-42c7-af84-79408c1ed3e1@github.com> On Wed, 15 May 2024 15:20:58 GMT, Per Minborg wrote: >> At some point in the future, 'jdk.unsupported' will be removed > > Maybe there is a better home for this? I don't think we should publish this API; this will soon be phased out by strict final fields (written only before super constructor calls) introduced by Valhalla, as strict final fields are never modifiable and can be safely trusted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1601887921 From kxu at openjdk.org Wed May 15 16:01:54 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 15 May 2024 16:01:54 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v5] In-Reply-To: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: > Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. > > Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: fix tests on larger strides ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18489/files - new: https://git.openjdk.org/jdk/pull/18489/files/85820dee..be0f596d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=03-04 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/18489.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18489/head:pull/18489 PR: https://git.openjdk.org/jdk/pull/18489 From liach at openjdk.org Wed May 15 16:36:07 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 15 May 2024 16:36:07 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v5] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: <3TJiHIc0qhHxrztP3GxRfIBAhgsVHkD5KBQ2uKUfB8g=.fbb9881f-6752-4353-bc90-cfe6f99fc5eb@github.com> On Mon, 6 May 2024 19:31:43 GMT, Per Minborg wrote: >> src/java.base/share/classes/jdk/internal/lang/StableArray.java line 25: >> >>> 23: * @since 23 >>> 24: */ >>> 25: public sealed interface StableArray >> >> Do we have a use case for StableArray beyond those of StableList? > > I am trying to model multi-dimensional arrays that also provide flattening. Let's see if it becomes useful. I think this StableArray can be used as an explicit field type to block reflection modifications and enforce constant-folding; the List interface cannot. Though in long term I still believe strict final fields are better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1601931752 From liach at openjdk.org Wed May 15 16:36:10 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 15 May 2024 16:36:10 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v5] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Wed, 15 May 2024 15:27:34 GMT, Per Minborg wrote: >> # Stable Values & Collections (Internal) >> >> ## Summary >> This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. >> >> ## Goals >> * Provide an easy and intuitive API to describe value holders that can change at most once. >> * Decouple declaration from initialization without significant footprint or performance penalties. >> * Reduce the amount of static initializer and/or field initialization code. >> * Uphold integrity and consistency, even in a multi-threaded environment. >> >> For more details, see the draft JEP: https://openjdk.org/jeps/8312611 >> >> ## Performance >> Performance compared to instance variables using two `AtomicReference` and two protected by double-checked locking under concurrent access by all threads: >> >> >> Benchmark Mode Cnt Score Error Units >> StableBenchmark.atomic thrpt 10 259.478 ? 36.809 ops/us >> StableBenchmark.dcl thrpt 10 225.710 ? 26.638 ops/us >> StableBenchmark.stable thrpt 10 4382.478 ? 1151.472 ops/us <- StableValue significantly faster >> >> >> Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (all threads): >> >> >> Benchmark Mode Cnt Score Error Units >> StableStaticBenchmark.atomic thrpt 10 6487.835 ? 385.639 ops/us >> StableStaticBenchmark.dcl thrpt 10 6605.239 ? 210.610 ops/us >> StableStaticBenchmark.stable thrpt 10 14338.239 ? 1426.874 ops/us >> StableStaticBenchmark.staticCHI thrpt 10 13780.341 ? 1839.651 ops/us >> >> >> Performance for stable lists (thread safe) in both instance and static contexts whereby we access a single value compared to `ArrayList` instances (which are not thread-safe) (all threads): >> >> >> Benchmark Mode Cnt Score Error Units >> StableListElementBenchmark.instanceArrayList thrpt 10 5812.992 ? 1169.730 ops/us >> StableListElementBenchmark.instanceList thrpt 10 4818.643 ? 704.893 ops/us >> StableListElementBenchmark... > > Per Minborg has updated the pull request incrementally with one additional commit since the last revision: > > Switch to monomorphic StableValue and use lazy arrays src/java.base/share/classes/jdk/internal/lang/StableValue.java line 384: > 382: * @param the memoized type > 383: */ > 384: static Supplier ofSupplier(Supplier original) { `ofSupplier` sounds like this method returns a `StableValue` from a `Supplier`. I recommend another name, such as `stableSupplier`, `wrapSupplier`, or `memoize`, to better associate with the method's types. src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 75: > 73: */ > 74: @Stable > 75: private int state; Can we change this to be a byte, so state and supplying fields can be packed together in 4 bytes in some layouts? src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 104: > 102: // Optimistically try plain semantics first > 103: final V v = value; > 104: if (v != null) { If `value == null && state == NULL`, can the path still be constant folded? I doubt it because `value` in this case may not be promoted to constant. src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 139: > 137: case NON_NULL: { return valueVolatile(); } > 138: case ERROR: { throw StableUtil.error(this); } > 139: case DUMMY: { throw shouldNotReachHere(); } Redundant branch? src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 236: > 234: } catch (Throwable t) { > 235: putState(ERROR); > 236: putMutex(t.getClass()); Should we cache the exception instance so we can rethrow it in future ERROR state `orThrow` calls? src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 240: > 238: } > 239: } finally { > 240: supplying = false; Resetting a stable field is a bad idea. I recommend renaming this to `supplierCalled` or `supplied` so we never transition this false -> true src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 256: > 254: > 255: @ForceInline > 256: private V computeIfUnsetShared(Object provider, K key) { Can we let suppliers share this path too, with a null key? I see this path supports suppliers but supplier code path doesn't call this path. src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 403: > 401: stable.computeIfUnset(supplier); > 402: } catch (Throwable throwable) { > 403: final Thread.UncaughtExceptionHandler uncaughtExceptionHandler = Does this exception handling differ from the default one for threads? If not, I think we can safely remove this catch block, as all exceptions are just propagated and computeIfUnset doesn't declare any checked exception. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1601935261 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1601937748 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1601916538 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1601918911 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1601940341 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1601941518 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1601942956 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1601927751 From thartmann at openjdk.org Wed May 15 17:14:03 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 15 May 2024 17:14:03 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v4] In-Reply-To: <8DnNVDn7mCzfbzXu1Q0GaDsb3OxzAQF1hiSX5RqDAcI=.ed4ba7b5-e418-40e9-b61d-5bdb4513484d@github.com> References: <8DnNVDn7mCzfbzXu1Q0GaDsb3OxzAQF1hiSX5RqDAcI=.ed4ba7b5-e418-40e9-b61d-5bdb4513484d@github.com> Message-ID: On Wed, 15 May 2024 12:41:59 GMT, Roland Westrelin wrote: > Should there be another follow up bug then? Or did I not understand what the follow up bug was about? Right, feel free to file a new one but I think to just keep track of it, we can as well add it to JDK-8332268 for now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18377#issuecomment-2113055110 From thartmann at openjdk.org Wed May 15 17:14:04 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 15 May 2024 17:14:04 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: <05mqp07IWOcReVgYabEiDRcLueAMXs8Y8sb1u5SqKiA=.ac76547b-7dcc-4c9b-8db6-d251e2cc9625@github.com> Message-ID: <7WICvrz__1PvkB26tS88S69PNEN3fKQRHOO1ZOFU2gI=.077e9e96-5528-4be3-8842-b31f94cb4ab5@github.com> On Wed, 15 May 2024 07:14:50 GMT, Roland Westrelin wrote: >>> What you're saying, I think, is that if we have, say, a CastII that's input to a DivI node, if the input to that cast is non zero, then we don't need to add the CastII control as dependency to the DivI >> >> Yes, that was my point. >> >>> That doesn't seem straightforward because this is done once we have no igvn instance to propagate types anymore. So, while I agree this is conservative, it still seems like the most reasonable fix. >> >> Right, we can still go down that path if it ever becomes necessary. >> >>> That seems like a different problem that out of the scope of this particular issue. >> >> Could you please file a follow-up bug for that? > > I filed https://bugs.openjdk.org/browse/JDK-8332268 Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18377#discussion_r1601988746 From thartmann at openjdk.org Wed May 15 17:33:04 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 15 May 2024 17:33:04 GMT Subject: RFR: JDK-8330565 : C2: Multiple crashes with CTW after JDK-8316991 [v2] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 04:11:36 GMT, Cesar Soares Lucas wrote: >> The `# assert(false) failed: Bad graph detected in build_loop_late` failure was caused because a string concatenation optimization using [this method](https://github.com/openjdk/jdk/blob/819f3d6fc70ff6fe54ac5f9033c17c3dd4326aa5/src/hotspot/share/opto/graphKit.cpp#L4115) adds AddP and LoadN nodes to IR graph as NotNull _and_ because RAM was not "nullyfing" phis merging nullable pointers. I was only able to reproduce this problem using a classfile/jar compiled using an "old" version of JDK.. because newer version use InvokeDynamic to do string concatenation. >> >> Tested with JTREG tier1-4 on Linux x86_64 & ARM64. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Refactor split_castpp_load_through_phi Looks good to me too. I submitted testing. test/hotspot/jtreg/compiler/c2/TestReduceAllocationAndNullableLoads.java line 33: > 31: * @run main/othervm -XX:CompileCommand=compileonly,*TestReduceAllocationAndNullableLoads*::* > 32: * -XX:CompileCommand=dontinline,*TestReduceAllocationAndNullableLoads*::* > 33: * -XX:-TieredCompilation -Xbatch -Xcomp -server Suggestion: * -XX:-TieredCompilation -Xcomp -server `-Xcomp` implies `-Xbatch` ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19147#pullrequestreview-2058619005 PR Review Comment: https://git.openjdk.org/jdk/pull/19147#discussion_r1602009075 From cslucas at openjdk.org Wed May 15 18:15:16 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 15 May 2024 18:15:16 GMT Subject: RFR: JDK-8330565 : C2: Multiple crashes with CTW after JDK-8316991 [v3] In-Reply-To: References: Message-ID: > The `# assert(false) failed: Bad graph detected in build_loop_late` failure was caused because a string concatenation optimization using [this method](https://github.com/openjdk/jdk/blob/819f3d6fc70ff6fe54ac5f9033c17c3dd4326aa5/src/hotspot/share/opto/graphKit.cpp#L4115) adds AddP and LoadN nodes to IR graph as NotNull _and_ because RAM was not "nullyfing" phis merging nullable pointers. I was only able to reproduce this problem using a classfile/jar compiled using an "old" version of JDK.. because newer version use InvokeDynamic to do string concatenation. > > Tested with JTREG tier1-4 on Linux x86_64 & ARM64. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/c2/TestReduceAllocationAndNullableLoads.java -Xcomp implies -Xbatch Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19147/files - new: https://git.openjdk.org/jdk/pull/19147/files/94eb0e12..facc93b9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19147&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19147&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19147.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19147/head:pull/19147 PR: https://git.openjdk.org/jdk/pull/19147 From duke at openjdk.org Wed May 15 18:44:09 2024 From: duke at openjdk.org (ExE Boss) Date: Wed, 15 May 2024 18:44:09 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v5] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Wed, 15 May 2024 15:27:34 GMT, Per Minborg wrote: >> # Stable Values & Collections (Internal) >> >> ## Summary >> This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. >> >> ## Goals >> * Provide an easy and intuitive API to describe value holders that can change at most once. >> * Decouple declaration from initialization without significant footprint or performance penalties. >> * Reduce the amount of static initializer and/or field initialization code. >> * Uphold integrity and consistency, even in a multi-threaded environment. >> >> For more details, see the draft JEP: https://openjdk.org/jeps/8312611 >> >> ## Performance >> Performance compared to instance variables using two `AtomicReference` and two protected by double-checked locking under concurrent access by all threads: >> >> >> Benchmark Mode Cnt Score Error Units >> StableBenchmark.atomic thrpt 10 259.478 ? 36.809 ops/us >> StableBenchmark.dcl thrpt 10 225.710 ? 26.638 ops/us >> StableBenchmark.stable thrpt 10 4382.478 ? 1151.472 ops/us <- StableValue significantly faster >> >> >> Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (all threads): >> >> >> Benchmark Mode Cnt Score Error Units >> StableStaticBenchmark.atomic thrpt 10 6487.835 ? 385.639 ops/us >> StableStaticBenchmark.dcl thrpt 10 6605.239 ? 210.610 ops/us >> StableStaticBenchmark.stable thrpt 10 14338.239 ? 1426.874 ops/us >> StableStaticBenchmark.staticCHI thrpt 10 13780.341 ? 1839.651 ops/us >> >> >> Performance for stable lists (thread safe) in both instance and static contexts whereby we access a single value compared to `ArrayList` instances (which are not thread-safe) (all threads): >> >> >> Benchmark Mode Cnt Score Error Units >> StableListElementBenchmark.instanceArrayList thrpt 10 5812.992 ? 1169.730 ops/us >> StableListElementBenchmark.instanceList thrpt 10 4818.643 ? 704.893 ops/us >> StableListElementBenchmark... > > Per Minborg has updated the pull request incrementally with one additional commit since the last revision: > > Switch to monomorphic StableValue and use lazy arrays src/java.base/share/classes/jdk/internal/lang/stable/StableUtil.java line 152: > 150: StableValueImpl witness = (StableValueImpl) > 151: Holder.UNSAFE.compareAndExchangeReference(elements, offset, null, stable); > 152: return witness == null ? stable: witness; Suggestion: return witness == null ? stable : witness; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1602094801 From duke at openjdk.org Wed May 15 18:55:09 2024 From: duke at openjdk.org (ExE Boss) Date: Wed, 15 May 2024 18:55:09 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v5] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Wed, 15 May 2024 16:10:06 GMT, Chen Liang wrote: >> Per Minborg has updated the pull request incrementally with one additional commit since the last revision: >> >> Switch to monomorphic StableValue and use lazy arrays > > src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 104: > >> 102: // Optimistically try plain semantics first >> 103: final V v = value; >> 104: if (v != null) { > > If `value == null && state == NULL`, can the path still be constant folded? I doubt it because `value` in this case may not be promoted to constant. Maybe the?`state?==?NULL` check should?be?moved before?`v?!=?null`, as?the?**JIT** doesn?t?constant?fold `null`?[`@Stable`]?values: https://github.com/openjdk/jdk/blob/8a4315f833f3700075d65fae6bc566011c837c07/src/java.base/share/classes/jdk/internal/vm/annotation/Stable.java#L41-L44 https://github.com/openjdk/jdk/blob/8a4315f833f3700075d65fae6bc566011c837c07/src/java.base/share/classes/jdk/internal/vm/annotation/Stable.java#L64-L71 [`@Stable`]: https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/jdk/internal/vm/annotation/Stable.java > src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 240: > >> 238: } >> 239: } finally { >> 240: supplying = false; > > Resetting a stable field is a bad idea. I recommend renaming this to `supplierCalled` or `supplied` so we never transition this false -> true Yes, according to?the?`@Stable` annotation?s?JavaDoc, this?is?UB: https://github.com/openjdk/jdk/blob/8a4315f833f3700075d65fae6bc566011c837c07/src/java.base/share/classes/jdk/internal/vm/annotation/Stable.java#L74-L80 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1602101301 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1602106099 From liach at openjdk.org Wed May 15 19:10:13 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 15 May 2024 19:10:13 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v5] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Wed, 15 May 2024 18:49:49 GMT, ExE Boss wrote: >> src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 240: >> >>> 238: } >>> 239: } finally { >>> 240: supplying = false; >> >> Resetting a stable field is a bad idea. I recommend renaming this to `supplierCalled` or `supplied` so we never transition this false -> true > > Yes, according to?the?`@Stable` annotation?s?JavaDoc, this?is?UB: > https://github.com/openjdk/jdk/blob/8a4315f833f3700075d65fae6bc566011c837c07/src/java.base/share/classes/jdk/internal/vm/annotation/Stable.java#L74-L80 Fyi what usually happens is that if a stable field or similarly constant-folded field is promoted to constant, the constant promotion can happen to any of the previous valid values written. MethodHandle optimisitically sets a trusted final field this way: https://github.com/openjdk/jdk/blob/8a4315f833f3700075d65fae6bc566011c837c07/src/java.base/share/classes/java/lang/invoke/MethodHandle.java#L1868-L1870 Also a similar example in user code targeting older Java releases, before JDK 16's strong encapsulation so that enums could have been added by reflection: https://github.com/MinecraftForge/MinecraftForge/issues/3885#issuecomment-355602542 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1602125255 From duke at openjdk.org Wed May 15 20:26:17 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Wed, 15 May 2024 20:26:17 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Rearrange; add lambdas for clarity First pass at StringIndexOfHuge.java and IndexOf.java test/jdk/java/lang/StringBuffer/IndexOf.java line 40: > 38: private static boolean failure = false; > 39: public static void main(String[] args) throws Exception { > 40: String testName = "IndexOf"; intentation test/jdk/java/lang/StringBuffer/IndexOf.java line 47: > 45: char[] haystack_16 = new char[128]; > 46: > 47: for (int i = 0; i < 128; i++) { you can use `char` instead of `int` as iterator test/jdk/java/lang/StringBuffer/IndexOf.java line 54: > 52: // for (int i = 1; i < 128; i++) { > 53: // haystack_16[i] = (char) (i); > 54: // } dead code test/jdk/java/lang/StringBuffer/IndexOf.java line 64: > 62: Charset hs_charset = StandardCharsets.UTF_16; > 63: Charset needleCharset = StandardCharsets.ISO_8859_1; > 64: // Charset needleCharset = StandardCharsets.UTF_16; Move from main() into a function that takes `needleCharset` as a parameter, then call that function twice. test/jdk/java/lang/StringBuffer/IndexOf.java line 81: > 79: sourceBuffer = new StringBuffer(sourceString); > 80: targetString = generateTestString(10, 11); > 81: } while (sourceString.indexOf(targetString) != -1); Should really keep the original test unmodified and add new tests as needed test/jdk/java/lang/StringBuffer/IndexOf.java line 83: > 81: shs = "$&),,18+-!'8)+"; > 82: endNeedle = "8)-"; > 83: l_offset = 9; dead code test/jdk/java/lang/StringBuffer/IndexOf.java line 89: > 87: StringBuffer bshs = new StringBuffer(shs); > 88: > 89: // printStringBytes(shs.getBytes(hs_charset)); dead code (and next two comments) test/jdk/java/lang/StringBuffer/IndexOf.java line 90: > 88: > 89: // printStringBytes(shs.getBytes(hs_charset)); > 90: for (int i = 0; i < 200000; i++) { This wont be a deterministic way to reach the intrinsic. I would suggest copying the idea from test/jdk/com/sun/crypto/provider/Cipher/ChaCha20/unittest/Poly1305UnitTestDriver.java i.e. Have two `@run main` invocations at the top of this file, one with default parameters, one with `-Xcomp -XX:-TieredCompilation`. You dont need a 'driver' program, that was to handle something else. /* * @test * @modules java.base/com.sun.crypto.provider * @run main java.base/com.sun.crypto.provider.Poly1305KAT * @summary Unit test for com.sun.crypto.provider.Poly1305. */ /* * @test * @modules java.base/com.sun.crypto.provider * @summary Unit test for IntrinsicCandidate in com.sun.crypto.provider.Poly1305. * @run main/othervm -Xcomp -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:+ForceUnreachable java.base/com.sun.crypto.provider.Poly1305KAT */ test/jdk/java/lang/StringBuffer/IndexOf.java line 126: > 124: int aNewLength = getRandomIndex(min, max); > 125: for (int y = 0; y < aNewLength; y++) { > 126: int achar = generator.nextInt(30) + 30; This will only ever generate LL cases, i.e. chars from [30,60]. Could be parametrized to also produce utf16 if instead of 30, offset was in the unicode range test/jdk/java/lang/StringBuffer/IndexOf.java line 199: > 197: System.out.println("Source="+sourceString.substring(hsBegin, hsBegin + haystackLen)); > 198: System.out.println("Target="+targetString.substring(nBegin, nBegin + needleLen)); > 199: System.out.println("haystackLen="+haystackLen+" neeldeLen="+needleLen+" hsBegin="+hsBegin+" nBegin="+nBegin+ This looks like 'development scaffolding' (i.e. printf debugging) that was meant to be removed test/jdk/java/lang/StringBuffer/IndexOf.java line 237: > 235: + sourceBuffer.toString() + " len Buffer = " + sourceBuffer.toString().length()); > 236: System.err.println(" naive = " + naiveFind(sourceBuffer.toString(), targetString, 0) + ", IndexOf = " > 237: + sourceBuffer.indexOf(targetString)); More tracing left behind here and rest of this function (original just recorded failure and moved along) test/jdk/java/lang/StringBuffer/IndexOf.java line 284: > 282: > 283: // Note: it is possible although highly improbable that failCount will > 284: // be > 0 even if everthing is working ok This sounds like either a bug or a testcase bug? Same as line 301, `extremely remote possibility of > 1 match`? test/jdk/java/lang/StringBuffer/IndexOf.java line 295: > 293: sourceString = generateTestString(99, 100); > 294: sourceBuffer = new StringBuffer(sourceString); > 295: targetString = generateTestString(10, 11); Generate a random int [0,1,2] for LL, UU, UL, pass that as parameter to generateTestString() to test the other paths. Same for other tests in this file using this pattern. This test is specific to haystacklen=100, needlelen=10.. what about other haystack/needle sizes to exercise all the paths in the intrinsic assembler (i.e. haystack >=, <=32, needlelen ={1,2,3,4,5..32..}). Elsewhere already? test/jdk/java/lang/StringBuffer/IndexOf.java line 360: > 358: System.err.println(" sAnswer = " + sAnswer + ", sbAnswer = " + sbAnswer); > 359: System.err.println(" testString = '" + testString + "'"); > 360: System.err.println(" testBuffer = '" + testBuffer + "'"); tracing left here and further down test/micro/org/openjdk/bench/java/lang/StringIndexOfHuge.java line 2: > 1: /* > 2: * Copyright (c) 2014, 2024, Oracle and/or its affiliates. All rights reserved. New file, just 2024 test/micro/org/openjdk/bench/java/lang/StringIndexOfHuge.java line 81: > 79: lateMatchString16 = dataStringHuge16.substring(dataStringHuge16.length() - 31); > 80: > 81: searchString = "oscar"; Would had liked to see a few more small needles (i.e. to test/verify individual switch statement cases) test/micro/org/openjdk/bench/java/lang/StringIndexOfHuge.java line 94: > 92: > 93: > 94: /** IndexOf Micros */ Would really had preferred @Param{"LL", "UU", "UL"}; would be easier to spot if there are any copy/paste errors.. test/micro/org/openjdk/bench/java/lang/StringIndexOfHuge.java line 132: > 130: @Benchmark > 131: public int searchHugeLargeSubstring() { > 132: return dataStringHuge.indexOf("B".repeat(30) + "X" + "A".repeat(30), 74); .repeat() call and string concatenation shouldn't be part of the benchmark (here and several other @Benchmark functions in this file) since it will detract from the measurement. (String concatenation gets converted (by javac) into StringBuilder().append().append()....append().toString()) test/micro/org/openjdk/bench/java/lang/StringIndexOfHuge.java line 242: > 240: @Benchmark > 241: public int search16HugeLargeSubstring16() { > 242: return dataStringHuge16.indexOf("B".repeat(30) + "X" + "A".repeat(30), 74); `search16HugeLargeSubstring16` implies UU, but with `"B".repeat(30) + "X" + "A".repeat(30)` is UL ------------- PR Review: https://git.openjdk.org/jdk/pull/16753#pullrequestreview-2058681000 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602136400 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602140456 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602137044 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602158011 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602160330 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602144091 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602147967 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602153043 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602181943 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602162587 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602167728 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602184697 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602198158 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602171418 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602200123 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602133525 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602130679 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602047091 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602115797 From duke at openjdk.org Wed May 15 20:26:17 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Wed, 15 May 2024 20:26:17 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 19:21:37 GMT, Volodymyr Paprotski wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Rearrange; add lambdas for clarity > > test/jdk/java/lang/StringBuffer/IndexOf.java line 47: > >> 45: char[] haystack_16 = new char[128]; >> 46: >> 47: for (int i = 0; i < 128; i++) { > > you can use `char` instead of `int` as iterator combine into single loop haystack[i] = (char) i; haystack_16[i] = (char) (i + 256); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602141543 From sviswanathan at openjdk.org Wed May 15 21:13:13 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 15 May 2024 21:13:13 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Rearrange; add lambdas for clarity src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1476: > 1474: _masm); > 1475: > 1476: __ movq(r11, -1); There doesn't seem to be a use of r11 below in this function. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1493: > 1491: // Assume r10 is n - k > 1492: __ leaq(last, Address(haystack, r10, Address::times_1, isU ? -30 : -31)); > 1493: __ jmpb(temp); Need to pass r10 as parameter. Also temp label could be given a better name. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1502: > 1500: > 1501: __ cmpq(hsPtrRet, last); > 1502: __ cmovq(Assembler::aboveEqual, hsPtrRet, last); cmovq is expensive, better sequence would be: __ cmpq(hsPtrRet, last); __ jb_b(temp); __ movq(hsPtrRet, last); src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1510: > 1508: compare_big_haystack_to_needle(sizeKnown, size, NUMBER_OF_NEEDLE_BYTES_TO_COMPARE, loop_top, hsPtrRet, hsLength, > 1509: needleLen, isU, DO_EARLY_BAILOUT, eq_mask, temp2, r10, _masm); > 1510: At this point hsLength is not the remaining length from hsPtrRet, would that cause a problem? If not, all the special paths in compare_big_haystack_to_needle need not be generated on this call. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602016421 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1601943761 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602251994 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602010926 From duke at openjdk.org Thu May 16 00:00:18 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Thu, 16 May 2024 00:00:18 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v20] In-Reply-To: References: Message-ID: <2FUcSV0iR3Z5z0I16gaklmxxJuGtWDA0pNXFqaLZOAg=.df8a6755-fc38-4a97-93bb-d255a0887bfb@github.com> > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: add asserts requiring UseAPX and UseAVX > 2 for egpr use with some instructions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/826fa2bb..156bbfc5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=18-19 Stats: 67 lines in 2 files changed: 56 ins; 7 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From rrich at openjdk.org Thu May 16 01:39:26 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 16 May 2024 01:39:26 GMT Subject: RFR: 8331311: C2: Big Endian Port of 8318446: optimize stores into primitive arrays by combining values into larger store [v4] In-Reply-To: References: Message-ID: > This pr adds a few tweaks to [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446) which allows enabling it also on big endian platforms (e.g. AIX, S390). JDK-8318446 introduced a C2 optimization to replace consecutive stores to a primitive array with just one store. > > By example (from `TestMergeStores.java`): > > > static Object[] test2a(byte[] a, int offset, long v) { > if (IS_BIG_ENDIAN) { > a[offset + 0] = (byte)(v >> 56); > a[offset + 1] = (byte)(v >> 48); > a[offset + 2] = (byte)(v >> 40); > a[offset + 3] = (byte)(v >> 32); > a[offset + 4] = (byte)(v >> 24); > a[offset + 5] = (byte)(v >> 16); > a[offset + 6] = (byte)(v >> 8); > a[offset + 7] = (byte)(v >> 0); > } else { > a[offset + 0] = (byte)(v >> 0); > a[offset + 1] = (byte)(v >> 8); > a[offset + 2] = (byte)(v >> 16); > a[offset + 3] = (byte)(v >> 24); > a[offset + 4] = (byte)(v >> 32); > a[offset + 5] = (byte)(v >> 40); > a[offset + 6] = (byte)(v >> 48); > a[offset + 7] = (byte)(v >> 56); > } > return new Object[]{ a }; > } > > > Depending on the endianess 8 bytes are stored into an array. The order of the stores is the same as the order of an 8-byte-store therefore 8 1-byte-stores can be replaced with just one 8-byte-store (if there aren't too many range checks). > > Additionally I've fixed a few comments and a test bug. > > The optimization seems to be a little bit more effective on big endian platforms. > > Again by example: > > > static Object[] test800a(byte[] a, int offset, long v) { > if (IS_BIG_ENDIAN) { > a[offset + 0] = (byte)(v >> 40); // Removed from candidate list > a[offset + 1] = (byte)(v >> 32); // Removed from candidate list > a[offset + 2] = (byte)(v >> 24); // Merged > a[offset + 3] = (byte)(v >> 16); // Merged > a[offset + 4] = (byte)(v >> 8); // Merged > a[offset + 5] = (byte)(v >> 0); // Merged > } else { > a[offset + 0] = (byte)(v >> 0); // Removed from candidate list > a[offset + 1] = (byte)(v >> 8); // Removed from candidate list > a[offset + 2] = (byte)(v >> 16); // Not merged > a[offset + 3] = (byte)(v >> 24); // Not merged > a[offset + 4] = (byte)(v >> 32); // Not merged > a[offset + 5] = (byte)(v >> 40); // Not merged > } > return new Object[]{ a };... Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Eliminate IS_BIG_ENDIAN and always execute both variants ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19218/files - new: https://git.openjdk.org/jdk/pull/19218/files/8844c837..3169a310 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19218&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19218&range=02-03 Stats: 802 lines in 1 file changed: 398 ins; 171 del; 233 mod Patch: https://git.openjdk.org/jdk/pull/19218.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19218/head:pull/19218 PR: https://git.openjdk.org/jdk/pull/19218 From rrich at openjdk.org Thu May 16 01:47:10 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 16 May 2024 01:47:10 GMT Subject: RFR: 8331311: C2: Big Endian Port of 8318446: optimize stores into primitive arrays by combining values into larger store [v4] In-Reply-To: <5Zx0TXxEyR05tg6kNhs_3tHKsTm9xFGYmawen_m4fb4=.991d9155-91ac-4534-b655-655adc5e2b51@github.com> References: <5Zx0TXxEyR05tg6kNhs_3tHKsTm9xFGYmawen_m4fb4=.991d9155-91ac-4534-b655-655adc5e2b51@github.com> Message-ID: On Tue, 14 May 2024 16:16:39 GMT, Emanuel Peter wrote: >> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: >> >> Eliminate IS_BIG_ENDIAN and always execute both variants > > test/hotspot/jtreg/compiler/c2/TestMergeStores.java line 57: > >> 55: private static final Random RANDOM = Utils.getRandomInstance(); >> 56: >> 57: private static final boolean IS_BIG_ENDIAN = UNSAFE.isBigEndian(); > > `static` is very important here, so that the `if` constant fold in the test. Otherwise we don't know if we have the IR rule pass because of the correct branch. Maybe add a comment for that. Sure. I assumed that is clear to people looking at jit compiler tests :) I removed `IS_BIG_ENDIAN` again since it wasn't needed anymore with the last comit (3169a3104b7323c4ff6f2714449a7c28025d0bba). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19218#discussion_r1602448193 From rrich at openjdk.org Thu May 16 05:33:02 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 16 May 2024 05:33:02 GMT Subject: RFR: 8331311: C2: Big Endian Port of 8318446: optimize stores into primitive arrays by combining values into larger store [v4] In-Reply-To: References: Message-ID: <7W-vxm7KC8qwd-GJAPh4TCtDhOzw7X5-gXanLudP27Y=.807f809f-92ce-498f-94c4-49b0405bbb6f@github.com> On Thu, 16 May 2024 01:39:26 GMT, Richard Reingruber wrote: >> This pr adds a few tweaks to [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446) which allows enabling it also on big endian platforms (e.g. AIX, S390). JDK-8318446 introduced a C2 optimization to replace consecutive stores to a primitive array with just one store. >> >> By example (from `TestMergeStores.java`): >> >> >> static Object[] test2a(byte[] a, int offset, long v) { >> if (IS_BIG_ENDIAN) { >> a[offset + 0] = (byte)(v >> 56); >> a[offset + 1] = (byte)(v >> 48); >> a[offset + 2] = (byte)(v >> 40); >> a[offset + 3] = (byte)(v >> 32); >> a[offset + 4] = (byte)(v >> 24); >> a[offset + 5] = (byte)(v >> 16); >> a[offset + 6] = (byte)(v >> 8); >> a[offset + 7] = (byte)(v >> 0); >> } else { >> a[offset + 0] = (byte)(v >> 0); >> a[offset + 1] = (byte)(v >> 8); >> a[offset + 2] = (byte)(v >> 16); >> a[offset + 3] = (byte)(v >> 24); >> a[offset + 4] = (byte)(v >> 32); >> a[offset + 5] = (byte)(v >> 40); >> a[offset + 6] = (byte)(v >> 48); >> a[offset + 7] = (byte)(v >> 56); >> } >> return new Object[]{ a }; >> } >> >> >> Depending on the endianess 8 bytes are stored into an array. The order of the stores is the same as the order of an 8-byte-store therefore 8 1-byte-stores can be replaced with just one 8-byte-store (if there aren't too many range checks). >> >> Additionally I've fixed a few comments and a test bug. >> >> The optimization seems to be a little bit more effective on big endian platforms. >> >> Again by example: >> >> >> static Object[] test800a(byte[] a, int offset, long v) { >> if (IS_BIG_ENDIAN) { >> a[offset + 0] = (byte)(v >> 40); // Removed from candidate list >> a[offset + 1] = (byte)(v >> 32); // Removed from candidate list >> a[offset + 2] = (byte)(v >> 24); // Merged >> a[offset + 3] = (byte)(v >> 16); // Merged >> a[offset + 4] = (byte)(v >> 8); // Merged >> a[offset + 5] = (byte)(v >> 0); // Merged >> } else { >> a[offset + 0] = (byte)(v >> 0); // Removed from candidate list >> a[offset + 1] = (byte)(v >> 8); // Removed from candidate list >> a[offset + 2] = (byte)(v >> 16); // Not merged >> a[offset + 3] = (byte)(v >> 24); // Not merged >> a[offset + 4] = (byte)(v >> 32); // Not merge... > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Eliminate IS_BIG_ENDIAN and always execute both variants Test error is unrelated to the changes. Upload of test results failed: `Error: Failed to CreateArtifact: Failed to make request after 5 attempts: Request timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact` ------------- PR Comment: https://git.openjdk.org/jdk/pull/19218#issuecomment-2114060049 From pminborg at openjdk.org Thu May 16 06:58:06 2024 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 16 May 2024 06:58:06 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v5] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: <_G7nE_OAMl9WSkAz21UDYHAlCRVeHN2ZmM0FR7Bmxtw=.ea94e982-3e9a-4c0e-8523-11372474d497@github.com> On Wed, 15 May 2024 18:45:16 GMT, ExE Boss wrote: >> src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 104: >> >>> 102: // Optimistically try plain semantics first >>> 103: final V v = value; >>> 104: if (v != null) { >> >> If `value == null && state == NULL`, can the path still be constant folded? I doubt it because `value` in this case may not be promoted to constant. > > Maybe the?`state?==?NULL` check should?be?moved before?`v?!=?null`, as?the?**JIT** doesn?t?constant?fold `null`?[`@Stable`]?values: > https://github.com/openjdk/jdk/blob/8a4315f833f3700075d65fae6bc566011c837c07/src/java.base/share/classes/jdk/internal/vm/annotation/Stable.java#L41-L44 https://github.com/openjdk/jdk/blob/8a4315f833f3700075d65fae6bc566011c837c07/src/java.base/share/classes/jdk/internal/vm/annotation/Stable.java#L64-L71 > > [`@Stable`]: https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/jdk/internal/vm/annotation/Stable.java It seems reasonable to assume `null` values are not constant-folded. For straight-out-of-the-box usage, there is no apparent significant difference as indicated by a new benchmark I just added: Benchmark Mode Cnt Score Error Units StableStaticBenchmark.atomic thrpt 10 5729.683 ? 502.023 ops/us StableStaticBenchmark.dcl thrpt 10 6069.222 ? 951.784 ops/us StableStaticBenchmark.dclHolder thrpt 10 5502.102 ? 1630.627 ops/us StableStaticBenchmark.stable thrpt 10 12737.158 ? 1746.456 ops/us <- Non-null benchmark StableStaticBenchmark.stableHolder thrpt 10 12053.978 ? 1421.527 ops/us StableStaticBenchmark.stableList thrpt 10 12443.870 ? 2084.607 ops/us StableStaticBenchmark.stableNull thrpt 10 13164.232 ? 591.284 ops/us <- Added null benchmark StableStaticBenchmark.stableRecordHolder thrpt 10 13638.893 ? 1250.895 ops/us StableStaticBenchmark.staticCHI thrpt 10 13639.220 ? 1190.922 ops/us If the `null` value participates in a much larger constant-folding tree, there might be a significant difference. I was afraid moving the order would have detrimental effects on instance performance but that does not seem to be the case: Checking value first: Benchmark Mode Cnt Score Error Units StableBenchmark.atomic thrpt 10 246.460 ? 75.417 ops/us StableBenchmark.dcl thrpt 10 243.481 ? 35.021 ops/us StableBenchmark.stable thrpt 10 4977.693 ? 675.926 ops/us <- Non-null StableBenchmark.stableHoldingList thrpt 10 3614.460 ? 275.140 ops/us StableBenchmark.stableList thrpt 10 3328.155 ? 898.202 ops/us StableBenchmark.stableListStored thrpt 10 3842.174 ? 535.902 ops/us StableBenchmark.stableNull thrpt 10 6217.737 ? 840.376 ops/us <- null StableBenchmark.supplier thrpt 10 9369.934 ? 1449.182 ops/us Checking null first: Benchmark Mode Cnt Score Error Units StableBenchmark.atomic thrpt 10 275.952 ? 39.480 ops/us StableBenchmark.dcl thrpt 10 252.697 ? 18.645 ops/us StableBenchmark.stable thrpt 10 5211.552 ? 315.307 ops/us <- Non-null StableBenchmark.stableHoldingList thrpt 10 3764.202 ? 224.325 ops/us StableBenchmark.stableList thrpt 10 3689.870 ? 419.858 ops/us StableBenchmark.stableListStored thrpt 10 3676.182 ? 938.485 ops/us StableBenchmark.stableNull thrpt 10 6046.935 ? 1512.391 ops/us <- null StableBenchmark.supplier thrpt 10 9202.202 ? 1479.950 ops/us So, swapping order seems to be the right move. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1602719002 From pminborg at openjdk.org Thu May 16 07:14:17 2024 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 16 May 2024 07:14:17 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v5] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: <5zoKgO17_hUh9-UveP-yo82Sh2Jrk2Z_3K8rsarDk10=.03e40aaa-cac7-4ef6-b9ca-131fd338a0a8@github.com> On Wed, 15 May 2024 16:11:56 GMT, Chen Liang wrote: >> Per Minborg has updated the pull request incrementally with one additional commit since the last revision: >> >> Switch to monomorphic StableValue and use lazy arrays > > src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 139: > >> 137: case NON_NULL: { return valueVolatile(); } >> 138: case ERROR: { throw StableUtil.error(this); } >> 139: case DUMMY: { throw shouldNotReachHere(); } > > Redundant branch? The idea here is to have the most likely value in the middle... Not sure if that motivates the added complexity though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1602739126 From roland at openjdk.org Thu May 16 07:16:06 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 16 May 2024 07:16:06 GMT Subject: RFR: 8331885: C2: meet between unloaded and speculative types is not symmetric In-Reply-To: References: Message-ID: On Wed, 15 May 2024 13:30:46 GMT, Vladimir Ivanov wrote: > `TypeInstPtr::xmeet_unloaded` computes the MEET of two InstPtrs when at least one is unloaded, but doesn't preserve speculative part if one is present. It causes the corresponding assert to fail. > > Proposed fix unconditionally keeps speculative part. > > Testing: hs-tier1 - hs-tier4 Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19249#pullrequestreview-2059790530 From roland at openjdk.org Thu May 16 07:16:10 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 16 May 2024 07:16:10 GMT Subject: RFR: 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode [v3] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 08:47:19 GMT, Christian Hagedorn wrote: >> This patch replaces the `Opaque4Node` of the `If` for Initialized Assertion Predicates with a new `OpaqueInitializedAsseritonPredicateNode`. This helps to simplify pattern matching for predicate code and to distinguish from the two other uses of `Opaque4` nodes: >> 1. Template Assertion Predicate: The goal is to get rid of its `Opaque4Node` as well by using a dedicated `TemplateAssertionPredicateNode` for the `IfNode`. >> 2. Non-null-checks with instrinsics and unsafe accesses: This will eventually be the only use left. Once we get there, we should rename the node accordingly to `OpaqueNonNullCheck` or something like that. >> >> I went through all the uses of `Opaque4` nodes and did the following: >> - Could the `Opaque4` node be part of an Initialized Assertion Predicate? >> - No: Added an assert that we are not dealing with an Initialized Assertion Predicate. >> - Yes: >> - Yes **and only** for Initialized Assertion Predicates? Added an assert that we are only expecting an `OpaqueInitializedAsseritonPredicateNode` if appropriate. >> - Yes but could also be something else: Added case for `OpaqueInitializedAsseritonPredicateNode` next to the `Opaque4` case. >> - Is this `Opaque4` node only used for Template Assertion Predicates? >> - Yes: Added assert with call to `assertion_predicate_has_loop_opaque_node()` to check that we find its `OpaqueLoop*Nodes`. >> - I've added test cases where I was not sure about whether an `Opaque4` node could be part of a Template, an Initialized Assertion Predicate or a non-null-check. This was a little tricky but I think it was still worth to prevent future bugs (even though most of these special cases are quite rare). >> >> This is another patch split off from the full fix for Assertion Predicates. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Make OpaqueInitializedAssertionPredicateNode a macro node again > - asdf > - Merge branch 'master' into JDK-8330386 > - Merge branch 'master' into JDK-8330386 > - Add more comments and asserts > - Add more tests > - 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18951#pullrequestreview-2059789034 From pminborg at openjdk.org Thu May 16 07:18:09 2024 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 16 May 2024 07:18:09 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v5] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Wed, 15 May 2024 16:19:05 GMT, Chen Liang wrote: >> Per Minborg has updated the pull request incrementally with one additional commit since the last revision: >> >> Switch to monomorphic StableValue and use lazy arrays > > src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 403: > >> 401: stable.computeIfUnset(supplier); >> 402: } catch (Throwable throwable) { >> 403: final Thread.UncaughtExceptionHandler uncaughtExceptionHandler = > > Does this exception handling differ from the default one for threads? If not, I think we can safely remove this catch block, as all exceptions are just propagated and computeIfUnset doesn't declare any checked exception. Nice catch. This will reduce complexity: @Override public void run() { stable.computeIfUnset(supplier); // Exceptions are implicitly captured by the tread's // uncaught exception handler. } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1602743924 From chagedorn at openjdk.org Thu May 16 07:19:09 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 16 May 2024 07:19:09 GMT Subject: RFR: 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode [v3] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 08:47:19 GMT, Christian Hagedorn wrote: >> This patch replaces the `Opaque4Node` of the `If` for Initialized Assertion Predicates with a new `OpaqueInitializedAsseritonPredicateNode`. This helps to simplify pattern matching for predicate code and to distinguish from the two other uses of `Opaque4` nodes: >> 1. Template Assertion Predicate: The goal is to get rid of its `Opaque4Node` as well by using a dedicated `TemplateAssertionPredicateNode` for the `IfNode`. >> 2. Non-null-checks with instrinsics and unsafe accesses: This will eventually be the only use left. Once we get there, we should rename the node accordingly to `OpaqueNonNullCheck` or something like that. >> >> I went through all the uses of `Opaque4` nodes and did the following: >> - Could the `Opaque4` node be part of an Initialized Assertion Predicate? >> - No: Added an assert that we are not dealing with an Initialized Assertion Predicate. >> - Yes: >> - Yes **and only** for Initialized Assertion Predicates? Added an assert that we are only expecting an `OpaqueInitializedAsseritonPredicateNode` if appropriate. >> - Yes but could also be something else: Added case for `OpaqueInitializedAsseritonPredicateNode` next to the `Opaque4` case. >> - Is this `Opaque4` node only used for Template Assertion Predicates? >> - Yes: Added assert with call to `assertion_predicate_has_loop_opaque_node()` to check that we find its `OpaqueLoop*Nodes`. >> - I've added test cases where I was not sure about whether an `Opaque4` node could be part of a Template, an Initialized Assertion Predicate or a non-null-check. This was a little tricky but I think it was still worth to prevent future bugs (even though most of these special cases are quite rare). >> >> This is another patch split off from the full fix for Assertion Predicates. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Make OpaqueInitializedAssertionPredicateNode a macro node again > - asdf > - Merge branch 'master' into JDK-8330386 > - Merge branch 'master' into JDK-8330386 > - Add more comments and asserts > - Add more tests > - 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode Thanks Roland for your review! @vnkozlov do you also agree with the updated version? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18951#issuecomment-2114245196 From pminborg at openjdk.org Thu May 16 07:22:07 2024 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 16 May 2024 07:22:07 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v5] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Wed, 15 May 2024 16:25:04 GMT, Chen Liang wrote: >> Per Minborg has updated the pull request incrementally with one additional commit since the last revision: >> >> Switch to monomorphic StableValue and use lazy arrays > > src/java.base/share/classes/jdk/internal/lang/StableValue.java line 384: > >> 382: * @param the memoized type >> 383: */ >> 384: static Supplier ofSupplier(Supplier original) { > > `ofSupplier` sounds like this method returns a `StableValue` from a `Supplier`. I recommend another name, such as `stableSupplier`, `wrapSupplier`, or `memoize`, to better associate with the method's types. One alternative would be to expose the types `StableList` and `StableMap`. This would allow detection of these types if declared. This would also allow us to expose the methods `computeIfUnset` as instance methods and remove the somewhat strange static counterparts. I agree strict finals are better in the long run. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1602749467 From pminborg at openjdk.org Thu May 16 07:26:15 2024 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 16 May 2024 07:26:15 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v5] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Thu, 16 May 2024 07:19:54 GMT, Per Minborg wrote: >> src/java.base/share/classes/jdk/internal/lang/StableValue.java line 384: >> >>> 382: * @param the memoized type >>> 383: */ >>> 384: static Supplier ofSupplier(Supplier original) { >> >> `ofSupplier` sounds like this method returns a `StableValue` from a `Supplier`. I recommend another name, such as `stableSupplier`, `wrapSupplier`, or `memoize`, to better associate with the method's types. > > One alternative would be to expose the types `StableList` and `StableMap`. This would allow detection of these types if declared. This would also allow us to expose the methods `computeIfUnset` as instance methods and remove the somewhat strange static counterparts. I agree strict finals are better in the long run. We had other names for the memoized factories before but some people did not like names like `asMemoized`. Maybe `ofSupplier` -> `memoizedSupplier` etc. ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1602754675 From roland at openjdk.org Thu May 16 07:28:15 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 16 May 2024 07:28:15 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v4] In-Reply-To: References: <8DnNVDn7mCzfbzXu1Q0GaDsb3OxzAQF1hiSX5RqDAcI=.ed4ba7b5-e418-40e9-b61d-5bdb4513484d@github.com> Message-ID: On Wed, 15 May 2024 17:11:16 GMT, Tobias Hartmann wrote: > > Should there be another follow up bug then? Or did I not understand what the follow up bug was about? > > Right, feel free to file a new one but I think to just keep track of it, we can as well add it to JDK-8332268 for now. I filed: https://bugs.openjdk.org/browse/JDK-8332356 Thanks for the reviews @TobiHartmann and @eme64 ------------- PR Comment: https://git.openjdk.org/jdk/pull/18377#issuecomment-2114261389 PR Comment: https://git.openjdk.org/jdk/pull/18377#issuecomment-2114263245 From roland at openjdk.org Thu May 16 07:28:16 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 16 May 2024 07:28:16 GMT Subject: Integrated: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs In-Reply-To: References: Message-ID: On Tue, 19 Mar 2024 13:21:49 GMT, Roland Westrelin wrote: > Range check `CastII` nodes are removed once loop opts are over. The > test case for this change includes 3 cases where elimination of a > range check `CastII` causes a crash in compiled code because either a > out of bounds array load or a division by zero happen. > > In `test1`: > > - the range checks for the `array[otherArray.length]` loads constant > fold: `otherArray.length` is a `CastII` of i at the `otherArray` > allocation. `i` is less than 9. The `CastII` at the allocation > narrows the type down further to `[0-9]`. > > - the `array[otherArray.length]` loads are control dependent on the > unrelated: > > > if (flag == 0) { > > > test. There's an identical dominating test which replaces that one. As > a consequence, the `array[otherArray.length]` loads become control > dependent on the dominating test. > > - The `CastII` nodes at the `otherArray` allocations are replaced by a > dominating range check `CastII` nodes for: > > > newArray[i] = 42; > > > - After loop opts, the range check `CastII` nodes are removed and the > 2 `array[otherArray.length]` loads common at the first: > > > if (flag == 0) { > > > test before the: > > > float[] otherArray = new float[i]; > > > and > > > newArray[i] = 42; > > > that guarantee `i` is positive. > > - `test1` is called with `i = -1`, the array load proceeds with an out > of bounds index and the crash occurs. > > > `test2` and `test3` are mostly identical except for the check that's > eliminated (a null divisor check) and the instruction that causes a > fault (an integer division). > > The fix I propose is to not eliminate range check `CastII` nodes after > loop opts. When range check`CastII` nodes were introduced, performance > was observed to regress. Removing them after loop opts was found to > preserve both correctness and performance. Today, the performance > regression still exists when `CastII` nodes are left in. So I propose > we keep them until the end of optimizations (so the 2 array loads > above don't lose a dependency and wrongly common) but remove them at > the end of all optimizations. > > In the case of the array loads, they are dependent on a range check > for another array through a range check `CastII` and we must not lose > that dependency otherwise the array loads could float above the range > check at gcm time. I propose we deal with that problem the way it's > handled for `CastPP` nodes: add the dependency to the load (or > division)nodes as a precedence edge when the cast is removed. > > @TobiHartmann ran performance testing for that patch (Thanks!) and reported > no regression. This pull request has now been integrated. Changeset: ab8d7b0c Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/ab8d7b0cedfaae124262325cd1d4b59cef996d85 Stats: 562 lines in 6 files changed: 536 ins; 23 del; 3 mod 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs Reviewed-by: epeter, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/18377 From pminborg at openjdk.org Thu May 16 07:29:21 2024 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 16 May 2024 07:29:21 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v6] In-Reply-To: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: <53bNF9v46bu-RgX1vJOwtdFKIzP3vwieCOENWtg2ra8=.c3f4dca8-6b1f-4c35-8e2c-f142c05dfe9b@github.com> > # Stable Values & Collections (Internal) > > ## Summary > This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. > > ## Goals > * Provide an easy and intuitive API to describe value holders that can change at most once. > * Decouple declaration from initialization without significant footprint or performance penalties. > * Reduce the amount of static initializer and/or field initialization code. > * Uphold integrity and consistency, even in a multi-threaded environment. > > For more details, see the draft JEP: https://openjdk.org/jeps/8312611 > > ## Performance > Performance compared to instance variables using two `AtomicReference` and two protected by double-checked locking under concurrent access by all threads: > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.atomic thrpt 10 259.478 ? 36.809 ops/us > StableBenchmark.dcl thrpt 10 225.710 ? 26.638 ops/us > StableBenchmark.stable thrpt 10 4382.478 ? 1151.472 ops/us <- StableValue significantly faster > > > Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (all threads): > > > Benchmark Mode Cnt Score Error Units > StableStaticBenchmark.atomic thrpt 10 6487.835 ? 385.639 ops/us > StableStaticBenchmark.dcl thrpt 10 6605.239 ? 210.610 ops/us > StableStaticBenchmark.stable thrpt 10 14338.239 ? 1426.874 ops/us > StableStaticBenchmark.staticCHI thrpt 10 13780.341 ? 1839.651 ops/us > > > Performance for stable lists (thread safe) in both instance and static contexts whereby we access a single value compared to `ArrayList` instances (which are not thread-safe) (all threads): > > > Benchmark Mode Cnt Score Error Units > StableListElementBenchmark.instanceArrayList thrpt 10 5812.992 ? 1169.730 ops/us > StableListElementBenchmark.instanceList thrpt 10 4818.643 ? 704.893 ops/us > StableListElementBenchmark.staticArrayList thrpt 10 7614.741 ? 564.777 ops/us > StableListElementBe... Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Simplify exception handling and add benchmarks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18794/files - new: https://git.openjdk.org/jdk/pull/18794/files/2b840e06..befb2751 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=04-05 Stats: 30 lines in 3 files changed: 19 ins; 6 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/18794.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18794/head:pull/18794 PR: https://git.openjdk.org/jdk/pull/18794 From chagedorn at openjdk.org Thu May 16 07:33:05 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 16 May 2024 07:33:05 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop [v2] In-Reply-To: <_kbcMydcMPblcm_FDDuL5vWGT7q6iRoarmYsTlEA0hQ=.290c6744-211d-406d-8ed1-90e510051167@github.com> References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> <_kbcMydcMPblcm_FDDuL5vWGT7q6iRoarmYsTlEA0hQ=.290c6744-211d-406d-8ed1-90e510051167@github.com> Message-ID: On Mon, 13 May 2024 13:23:46 GMT, Roland Westrelin wrote: >> In the test case: >> >> >> long i; >> for (; i > 0; i--) { >> res += 42 / ((int) i); >> >> >> The long counted loop phi has type `[1..100]`. As a consequence, the >> `ConvL2I` also has type `[1..100]`. The `DivI` node that follows can't >> fault: it is not guarded by a zero check and has no control set. >> >> The `ConvL2I` is split through phi and so is the `DiVI` node: >> `PhaseIdealLoop::cannot_split_division()` returns true because the >> value coming from the backedge into the `DivI` (when it is about to be >> split thru phi) is the result of the `ConvL2I` which has type >> `[1..100`] so is not zero as far as the compiler can tell. >> >> On the last iteration of the loop, i is 1. Because the DivI was split >> thru Phi, it computes the value for the following iteration, so for i >> = 0. This causes a crash when the compiled code runs. >> >> The same problem can't happen with an int counted loop because logic >> in `PhaseIdealLoop::split_thru_phi()` prevents a `ConvI2L` from being >> split thru phi. I propose to fix this the same way: in the test case, >> it's not true that once the `ConvL2I` is split thru phi it keeps type >> `[1..100]`. The fix is fairly conservative because it's base on the >> existing logic for `ConvI2L`: we would want to not split a `ConvL2I` >> only a counted loopd but. I suppose the same is true for the `ConvI2L` >> and I thought it would be best to revisit both together. > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - test case tweaks > - fuzzer test Still looks good, thanks for adding the test! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19086#pullrequestreview-2059830760 From pminborg at openjdk.org Thu May 16 07:37:20 2024 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 16 May 2024 07:37:20 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v7] In-Reply-To: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: > # Stable Values & Collections (Internal) > > ## Summary > This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. > > ## Goals > * Provide an easy and intuitive API to describe value holders that can change at most once. > * Decouple declaration from initialization without significant footprint or performance penalties. > * Reduce the amount of static initializer and/or field initialization code. > * Uphold integrity and consistency, even in a multi-threaded environment. > > For more details, see the draft JEP: https://openjdk.org/jeps/8312611 > > ## Performance > Performance compared to instance variables using two `AtomicReference` and two protected by double-checked locking under concurrent access by all threads: > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.atomic thrpt 10 259.478 ? 36.809 ops/us > StableBenchmark.dcl thrpt 10 225.710 ? 26.638 ops/us > StableBenchmark.stable thrpt 10 4382.478 ? 1151.472 ops/us <- StableValue significantly faster > > > Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (all threads): > > > Benchmark Mode Cnt Score Error Units > StableStaticBenchmark.atomic thrpt 10 6487.835 ? 385.639 ops/us > StableStaticBenchmark.dcl thrpt 10 6605.239 ? 210.610 ops/us > StableStaticBenchmark.stable thrpt 10 14338.239 ? 1426.874 ops/us > StableStaticBenchmark.staticCHI thrpt 10 13780.341 ? 1839.651 ops/us > > > Performance for stable lists (thread safe) in both instance and static contexts whereby we access a single value compared to `ArrayList` instances (which are not thread-safe) (all threads): > > > Benchmark Mode Cnt Score Error Units > StableListElementBenchmark.instanceArrayList thrpt 10 5812.992 ? 1169.730 ops/us > StableListElementBenchmark.instanceList thrpt 10 4818.643 ? 704.893 ops/us > StableListElementBenchmark.staticArrayList thrpt 10 7614.741 ? 564.777 ops/us > StableListElementBe... Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Rename memoized factories ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18794/files - new: https://git.openjdk.org/jdk/pull/18794/files/befb2751..b845c589 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=05-06 Stats: 13 lines in 5 files changed: 0 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/18794.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18794/head:pull/18794 PR: https://git.openjdk.org/jdk/pull/18794 From roland at openjdk.org Thu May 16 07:42:29 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 16 May 2024 07:42:29 GMT Subject: RFR: 8328107: Shenandoah/C2: TestVerifyLoopOptimizations test failure Message-ID: The failure occurs because a load barrier is expanded on the backedge of the counted loop. That breaks the expected counted loop shape. The fix I propose is to replace the `CountedLoop` with a `Loop` node when that happens. We're basically done with optimizations related to counted loop at this point so this shouldn't make a difference. ------------- Commit messages: - test and fix Changes: https://git.openjdk.org/jdk/pull/19259/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19259&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8328107 Stats: 92 lines in 2 files changed: 92 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19259.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19259/head:pull/19259 PR: https://git.openjdk.org/jdk/pull/19259 From pminborg at openjdk.org Thu May 16 07:44:08 2024 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 16 May 2024 07:44:08 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v5] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: <9E5b14zDF-FxZBdff6Ih83L55BlcBFt6sjK_aA7zLKc=.99cff2c0-68c1-4efd-a50b-79ca14b16e49@github.com> On Wed, 15 May 2024 19:07:26 GMT, Chen Liang wrote: >> Yes, according to?the?`@Stable` annotation?s?JavaDoc, this?is?UB: >> https://github.com/openjdk/jdk/blob/8a4315f833f3700075d65fae6bc566011c837c07/src/java.base/share/classes/jdk/internal/vm/annotation/Stable.java#L74-L80 > > Fyi what usually happens is that if a stable field or similarly constant-folded field is promoted to constant, the constant promotion can happen to any of the previous valid values written. > > MethodHandle optimisitically sets a trusted final field this way: > https://github.com/openjdk/jdk/blob/8a4315f833f3700075d65fae6bc566011c837c07/src/java.base/share/classes/java/lang/invoke/MethodHandle.java#L1868-L1870 > > Also a similar example in user code targeting older Java releases, before JDK 16's strong encapsulation so that enums could have been added by reflection: > https://github.com/MinecraftForge/MinecraftForge/issues/3885#issuecomment-355602542 Somehow the `@Stable` annotation sneaked into the `supplying` field. But actually keeping it that way and just set the value once would be better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1602781188 From pminborg at openjdk.org Thu May 16 07:49:37 2024 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 16 May 2024 07:49:37 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v8] In-Reply-To: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: > # Stable Values & Collections (Internal) > > ## Summary > This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. > > ## Goals > * Provide an easy and intuitive API to describe value holders that can change at most once. > * Decouple declaration from initialization without significant footprint or performance penalties. > * Reduce the amount of static initializer and/or field initialization code. > * Uphold integrity and consistency, even in a multi-threaded environment. > > For more details, see the draft JEP: https://openjdk.org/jeps/8312611 > > ## Performance > Performance compared to instance variables using two `AtomicReference` and two protected by double-checked locking under concurrent access by all threads: > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.atomic thrpt 10 259.478 ? 36.809 ops/us > StableBenchmark.dcl thrpt 10 225.710 ? 26.638 ops/us > StableBenchmark.stable thrpt 10 4382.478 ? 1151.472 ops/us <- StableValue significantly faster > > > Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (all threads): > > > Benchmark Mode Cnt Score Error Units > StableStaticBenchmark.atomic thrpt 10 6487.835 ? 385.639 ops/us > StableStaticBenchmark.dcl thrpt 10 6605.239 ? 210.610 ops/us > StableStaticBenchmark.stable thrpt 10 14338.239 ? 1426.874 ops/us > StableStaticBenchmark.staticCHI thrpt 10 13780.341 ? 1839.651 ops/us > > > Performance for stable lists (thread safe) in both instance and static contexts whereby we access a single value compared to `ArrayList` instances (which are not thread-safe) (all threads): > > > Benchmark Mode Cnt Score Error Units > StableListElementBenchmark.instanceArrayList thrpt 10 5812.992 ? 1169.730 ops/us > StableListElementBenchmark.instanceList thrpt 10 4818.643 ? 704.893 ops/us > StableListElementBenchmark.staticArrayList thrpt 10 7614.741 ? 564.777 ops/us > StableListElementBe... Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Rework the way compute invocation is recorded ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18794/files - new: https://git.openjdk.org/jdk/pull/18794/files/b845c589..d8875db7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=06-07 Stats: 40 lines in 1 file changed: 3 ins; 12 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/18794.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18794/head:pull/18794 PR: https://git.openjdk.org/jdk/pull/18794 From pminborg at openjdk.org Thu May 16 07:49:37 2024 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 16 May 2024 07:49:37 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v5] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Wed, 15 May 2024 16:31:15 GMT, Chen Liang wrote: >> Per Minborg has updated the pull request incrementally with one additional commit since the last revision: >> >> Switch to monomorphic StableValue and use lazy arrays > > src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 256: > >> 254: >> 255: @ForceInline >> 256: private V computeIfUnsetShared(Object provider, K key) { > > Can we let suppliers share this path too, with a null key? I see this path supports suppliers but supplier code path doesn't call this path. Good suggestion! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1602787248 From pminborg at openjdk.org Thu May 16 07:55:39 2024 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 16 May 2024 07:55:39 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v9] In-Reply-To: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: > # Stable Values & Collections (Internal) > > ## Summary > This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. > > ## Goals > * Provide an easy and intuitive API to describe value holders that can change at most once. > * Decouple declaration from initialization without significant footprint or performance penalties. > * Reduce the amount of static initializer and/or field initialization code. > * Uphold integrity and consistency, even in a multi-threaded environment. > > For more details, see the draft JEP: https://openjdk.org/jeps/8312611 > > ## Performance > Performance compared to instance variables using two `AtomicReference` and two protected by double-checked locking under concurrent access by all threads: > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.atomic thrpt 10 259.478 ? 36.809 ops/us > StableBenchmark.dcl thrpt 10 225.710 ? 26.638 ops/us > StableBenchmark.stable thrpt 10 4382.478 ? 1151.472 ops/us <- StableValue significantly faster > > > Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (all threads): > > > Benchmark Mode Cnt Score Error Units > StableStaticBenchmark.atomic thrpt 10 6487.835 ? 385.639 ops/us > StableStaticBenchmark.dcl thrpt 10 6605.239 ? 210.610 ops/us > StableStaticBenchmark.stable thrpt 10 14338.239 ? 1426.874 ops/us > StableStaticBenchmark.staticCHI thrpt 10 13780.341 ? 1839.651 ops/us > > > Performance for stable lists (thread safe) in both instance and static contexts whereby we access a single value compared to `ArrayList` instances (which are not thread-safe) (all threads): > > > Benchmark Mode Cnt Score Error Units > StableListElementBenchmark.instanceArrayList thrpt 10 5812.992 ? 1169.730 ops/us > StableListElementBenchmark.instanceList thrpt 10 4818.643 ? 704.893 ops/us > StableListElementBenchmark.staticArrayList thrpt 10 7614.741 ? 564.777 ops/us > StableListElementBe... Per Minborg has updated the pull request incrementally with two additional commits since the last revision: - Add comment on security precaution - Share code paths ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18794/files - new: https://git.openjdk.org/jdk/pull/18794/files/d8875db7..058cfddf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=07-08 Stats: 55 lines in 1 file changed: 1 ins; 53 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18794.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18794/head:pull/18794 PR: https://git.openjdk.org/jdk/pull/18794 From pminborg at openjdk.org Thu May 16 07:55:40 2024 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 16 May 2024 07:55:40 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v5] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Wed, 15 May 2024 16:29:08 GMT, Chen Liang wrote: >> Per Minborg has updated the pull request incrementally with one additional commit since the last revision: >> >> Switch to monomorphic StableValue and use lazy arrays > > src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 236: > >> 234: } catch (Throwable t) { >> 235: putState(ERROR); >> 236: putMutex(t.getClass()); > > Should we cache the exception instance so we can rethrow it in future ERROR state `orThrow` calls? We considered recording the entire exception instance but for security reasons, we ended up just recording the type of exception. I will add a comment explaining this in the code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1602793832 From pminborg at openjdk.org Thu May 16 07:58:11 2024 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 16 May 2024 07:58:11 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v5] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: <_mILA1l3vosqmxf8OKbgUr0_IYxfReWuvXKpxphC61E=.95b36c90-3ce7-4033-9c93-b014cbc5ee0d@github.com> On Wed, 15 May 2024 16:26:57 GMT, Chen Liang wrote: >> Per Minborg has updated the pull request incrementally with one additional commit since the last revision: >> >> Switch to monomorphic StableValue and use lazy arrays > > src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 75: > >> 73: */ >> 74: @Stable >> 75: private int state; > > Can we change this to be a byte, so state and supplying fields can be packed together in 4 bytes in some layouts? We had `byte` before but converted to `int` as the `byte` will get promoted to an `int` anyhow in the code logic. However, one idea would be to go back to using a `byte` again and also use a `byte` for the flag of computation invocation. This would reduce the footprint for these two fields from 8 bytes to 2 bytes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1602800304 From pminborg at openjdk.org Thu May 16 08:01:11 2024 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 16 May 2024 08:01:11 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v4] In-Reply-To: <8QkAH25MWgY2qvCQgPPID71ye9gL3cX5MqhHs1Fapy0=.293fae67-c17f-42c7-af84-79408c1ed3e1@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> <8QkAH25MWgY2qvCQgPPID71ye9gL3cX5MqhHs1Fapy0=.293fae67-c17f-42c7-af84-79408c1ed3e1@github.com> Message-ID: On Wed, 15 May 2024 15:49:22 GMT, Chen Liang wrote: >> Maybe there is a better home for this? > > I don't think we should publish this API; this will soon be phased out by strict final fields (written only before super constructor calls) introduced by Valhalla, as strict final fields are never modifiable and can be safely trusted. Let's keep it internal. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1602804071 From roland at openjdk.org Thu May 16 08:57:18 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 16 May 2024 08:57:18 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop [v2] In-Reply-To: <_kbcMydcMPblcm_FDDuL5vWGT7q6iRoarmYsTlEA0hQ=.290c6744-211d-406d-8ed1-90e510051167@github.com> References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> <_kbcMydcMPblcm_FDDuL5vWGT7q6iRoarmYsTlEA0hQ=.290c6744-211d-406d-8ed1-90e510051167@github.com> Message-ID: On Mon, 13 May 2024 13:23:46 GMT, Roland Westrelin wrote: >> In the test case: >> >> >> long i; >> for (; i > 0; i--) { >> res += 42 / ((int) i); >> >> >> The long counted loop phi has type `[1..100]`. As a consequence, the >> `ConvL2I` also has type `[1..100]`. The `DivI` node that follows can't >> fault: it is not guarded by a zero check and has no control set. >> >> The `ConvL2I` is split through phi and so is the `DiVI` node: >> `PhaseIdealLoop::cannot_split_division()` returns true because the >> value coming from the backedge into the `DivI` (when it is about to be >> split thru phi) is the result of the `ConvL2I` which has type >> `[1..100`] so is not zero as far as the compiler can tell. >> >> On the last iteration of the loop, i is 1. Because the DivI was split >> thru Phi, it computes the value for the following iteration, so for i >> = 0. This causes a crash when the compiled code runs. >> >> The same problem can't happen with an int counted loop because logic >> in `PhaseIdealLoop::split_thru_phi()` prevents a `ConvI2L` from being >> split thru phi. I propose to fix this the same way: in the test case, >> it's not true that once the `ConvL2I` is split thru phi it keeps type >> `[1..100]`. The fix is fairly conservative because it's base on the >> existing logic for `ConvI2L`: we would want to not split a `ConvL2I` >> only a counted loopd but. I suppose the same is true for the `ConvI2L` >> and I thought it would be best to revisit both together. > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - test case tweaks > - fuzzer test FTR, I double checked that fuzzer test failures from JDK-8298851 are indeed the same issue and are fixed with this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19086#issuecomment-2114566185 From roland at openjdk.org Thu May 16 08:57:19 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 16 May 2024 08:57:19 GMT Subject: Integrated: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop In-Reply-To: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> Message-ID: On Fri, 3 May 2024 12:33:43 GMT, Roland Westrelin wrote: > In the test case: > > > long i; > for (; i > 0; i--) { > res += 42 / ((int) i); > > > The long counted loop phi has type `[1..100]`. As a consequence, the > `ConvL2I` also has type `[1..100]`. The `DivI` node that follows can't > fault: it is not guarded by a zero check and has no control set. > > The `ConvL2I` is split through phi and so is the `DiVI` node: > `PhaseIdealLoop::cannot_split_division()` returns true because the > value coming from the backedge into the `DivI` (when it is about to be > split thru phi) is the result of the `ConvL2I` which has type > `[1..100`] so is not zero as far as the compiler can tell. > > On the last iteration of the loop, i is 1. Because the DivI was split > thru Phi, it computes the value for the following iteration, so for i > = 0. This causes a crash when the compiled code runs. > > The same problem can't happen with an int counted loop because logic > in `PhaseIdealLoop::split_thru_phi()` prevents a `ConvI2L` from being > split thru phi. I propose to fix this the same way: in the test case, > it's not true that once the `ConvL2I` is split thru phi it keeps type > `[1..100]`. The fix is fairly conservative because it's base on the > existing logic for `ConvI2L`: we would want to not split a `ConvL2I` > only a counted loopd but. I suppose the same is true for the `ConvI2L` > and I thought it would be best to revisit both together. This pull request has now been integrated. Changeset: f398cd22 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/f398cd225012694a586e528936159b6df7b1586c Stats: 129 lines in 3 files changed: 127 ins; 0 del; 2 mod 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop Reviewed-by: chagedorn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/19086 From pminborg at openjdk.org Thu May 16 09:01:22 2024 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 16 May 2024 09:01:22 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v10] In-Reply-To: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: > # Stable Values & Collections (Internal) > > ## Summary > This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. > > ## Goals > * Provide an easy and intuitive API to describe value holders that can change at most once. > * Decouple declaration from initialization without significant footprint or performance penalties. > * Reduce the amount of static initializer and/or field initialization code. > * Uphold integrity and consistency, even in a multi-threaded environment. > > For more details, see the draft JEP: https://openjdk.org/jeps/8312611 > > ## Performance > Performance compared to instance variables using two `AtomicReference` and two protected by double-checked locking under concurrent access by all threads: > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.atomic thrpt 10 259.478 ? 36.809 ops/us > StableBenchmark.dcl thrpt 10 225.710 ? 26.638 ops/us > StableBenchmark.stable thrpt 10 4382.478 ? 1151.472 ops/us <- StableValue significantly faster > > > Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (all threads): > > > Benchmark Mode Cnt Score Error Units > StableStaticBenchmark.atomic thrpt 10 6487.835 ? 385.639 ops/us > StableStaticBenchmark.dcl thrpt 10 6605.239 ? 210.610 ops/us > StableStaticBenchmark.stable thrpt 10 14338.239 ? 1426.874 ops/us > StableStaticBenchmark.staticCHI thrpt 10 13780.341 ? 1839.651 ops/us > > > Performance for stable lists (thread safe) in both instance and static contexts whereby we access a single value compared to `ArrayList` instances (which are not thread-safe) (all threads): > > > Benchmark Mode Cnt Score Error Units > StableListElementBenchmark.instanceArrayList thrpt 10 5812.992 ? 1169.730 ops/us > StableListElementBenchmark.instanceList thrpt 10 4818.643 ? 704.893 ops/us > StableListElementBenchmark.staticArrayList thrpt 10 7614.741 ? 564.777 ops/us > StableListElementBe... Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Use byte for storing state and compute flags ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18794/files - new: https://git.openjdk.org/jdk/pull/18794/files/058cfddf..80b7e081 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=08-09 Stats: 42 lines in 2 files changed: 11 ins; 17 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/18794.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18794/head:pull/18794 PR: https://git.openjdk.org/jdk/pull/18794 From dholmes at openjdk.org Thu May 16 09:10:16 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 16 May 2024 09:10:16 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v5] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 12:07:22 GMT, Roland Westrelin wrote: >> Range check `CastII` nodes are removed once loop opts are over. The >> test case for this change includes 3 cases where elimination of a >> range check `CastII` causes a crash in compiled code because either a >> out of bounds array load or a division by zero happen. >> >> In `test1`: >> >> - the range checks for the `array[otherArray.length]` loads constant >> fold: `otherArray.length` is a `CastII` of i at the `otherArray` >> allocation. `i` is less than 9. The `CastII` at the allocation >> narrows the type down further to `[0-9]`. >> >> - the `array[otherArray.length]` loads are control dependent on the >> unrelated: >> >> >> if (flag == 0) { >> >> >> test. There's an identical dominating test which replaces that one. As >> a consequence, the `array[otherArray.length]` loads become control >> dependent on the dominating test. >> >> - The `CastII` nodes at the `otherArray` allocations are replaced by a >> dominating range check `CastII` nodes for: >> >> >> newArray[i] = 42; >> >> >> - After loop opts, the range check `CastII` nodes are removed and the >> 2 `array[otherArray.length]` loads common at the first: >> >> >> if (flag == 0) { >> >> >> test before the: >> >> >> float[] otherArray = new float[i]; >> >> >> and >> >> >> newArray[i] = 42; >> >> >> that guarantee `i` is positive. >> >> - `test1` is called with `i = -1`, the array load proceeds with an out >> of bounds index and the crash occurs. >> >> >> `test2` and `test3` are mostly identical except for the check that's >> eliminated (a null divisor check) and the instruction that causes a >> fault (an integer division). >> >> The fix I propose is to not eliminate range check `CastII` nodes after >> loop opts. When range check`CastII` nodes were introduced, performance >> was observed to regress. Removing them after loop opts was found to >> preserve both correctness and performance. Today, the performance >> regression still exists when `CastII` nodes are left in. So I propose >> we keep them until the end of optimizations (so the 2 array loads >> above don't lose a dependency and wrongly common) but remove them at >> the end of all optimizations. >> >> In the case of the array loads, they are dependent on a range check >> for another array through a range check `CastII` and we must not lose >> that dependency otherwise the array loads could float above the range >> check at gcm time. I propose we deal with that problem the way it's >> handled for `CastPP` nodes: add the dependency to the load (or >> division)nodes ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: > > - Node::is_div_or_mod() > - Merge branch 'master' into JDK-8324517 > - test fix > - review > - Merge branch 'master' into JDK-8324517 > - Merge branch 'master' into JDK-8324517 > - review > - Merge branch 'master' into JDK-8324517 > - test and fix This is causing a crash in compiler/rangechecks/TestArrayAccessAboveRCAfterRCCastIIEliminated.java Aarch64 only so far: # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/opt/mach5/mesos/work_dir/slaves/a4a7850a-7c35-410a-b879-d77fbb2f6087-S6223/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/c30fdcec-3ee7-44a9-bfc5-38869fd48e4b/runs/4e34865c-00ca-43a6-87eb-21db2db1c8ab/workspace/open/src/hotspot/share/opto/gcm.cpp:1423), pid=1811107, tid=1811123 # assert(false) failed: graph should be schedulable will get a bug filed ------------- PR Comment: https://git.openjdk.org/jdk/pull/18377#issuecomment-2114617357 From chagedorn at openjdk.org Thu May 16 09:15:14 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 16 May 2024 09:15:14 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v5] In-Reply-To: References: Message-ID: On Thu, 16 May 2024 09:06:55 GMT, David Holmes wrote: > will get a bug filed @rwestrel Filed [JDK-8332369](https://bugs.openjdk.org/browse/JDK-8332369), can you have a look at it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18377#issuecomment-2114633970 From epeter at openjdk.org Thu May 16 09:15:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 16 May 2024 09:15:15 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v5] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 12:07:22 GMT, Roland Westrelin wrote: >> Range check `CastII` nodes are removed once loop opts are over. The >> test case for this change includes 3 cases where elimination of a >> range check `CastII` causes a crash in compiled code because either a >> out of bounds array load or a division by zero happen. >> >> In `test1`: >> >> - the range checks for the `array[otherArray.length]` loads constant >> fold: `otherArray.length` is a `CastII` of i at the `otherArray` >> allocation. `i` is less than 9. The `CastII` at the allocation >> narrows the type down further to `[0-9]`. >> >> - the `array[otherArray.length]` loads are control dependent on the >> unrelated: >> >> >> if (flag == 0) { >> >> >> test. There's an identical dominating test which replaces that one. As >> a consequence, the `array[otherArray.length]` loads become control >> dependent on the dominating test. >> >> - The `CastII` nodes at the `otherArray` allocations are replaced by a >> dominating range check `CastII` nodes for: >> >> >> newArray[i] = 42; >> >> >> - After loop opts, the range check `CastII` nodes are removed and the >> 2 `array[otherArray.length]` loads common at the first: >> >> >> if (flag == 0) { >> >> >> test before the: >> >> >> float[] otherArray = new float[i]; >> >> >> and >> >> >> newArray[i] = 42; >> >> >> that guarantee `i` is positive. >> >> - `test1` is called with `i = -1`, the array load proceeds with an out >> of bounds index and the crash occurs. >> >> >> `test2` and `test3` are mostly identical except for the check that's >> eliminated (a null divisor check) and the instruction that causes a >> fault (an integer division). >> >> The fix I propose is to not eliminate range check `CastII` nodes after >> loop opts. When range check`CastII` nodes were introduced, performance >> was observed to regress. Removing them after loop opts was found to >> preserve both correctness and performance. Today, the performance >> regression still exists when `CastII` nodes are left in. So I propose >> we keep them until the end of optimizations (so the 2 array loads >> above don't lose a dependency and wrongly common) but remove them at >> the end of all optimizations. >> >> In the case of the array loads, they are dependent on a range check >> for another array through a range check `CastII` and we must not lose >> that dependency otherwise the array loads could float above the range >> check at gcm time. I propose we deal with that problem the way it's >> handled for `CastPP` nodes: add the dependency to the load (or >> division)nodes ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: > > - Node::is_div_or_mod() > - Merge branch 'master' into JDK-8324517 > - test fix > - review > - Merge branch 'master' into JDK-8324517 > - Merge branch 'master' into JDK-8324517 > - review > - Merge branch 'master' into JDK-8324517 > - test and fix Hmm seems we only ran test from our side for v01, and the test there had a crash too, though different. `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:+TieredCompilation` These seem to be the flags. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18377#issuecomment-2114636364 From mdoerr at openjdk.org Thu May 16 09:17:20 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 16 May 2024 09:17:20 GMT Subject: RFR: 8331935: Add support for primitive array C1 clone intrinsic in PPC In-Reply-To: References: Message-ID: On Wed, 15 May 2024 13:50:27 GMT, Varada M wrote: > https://bugs.openjdk.org/browse/JDK-8302850 port for PPC64 > > JMH Benchmark Results > > > Before : > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 114.107 ? 1.337 ns/op > ArrayClone.byteArraycopy 10 avgt 15 130.492 ? 0.991 ns/op > ArrayClone.byteArraycopy 100 avgt 15 139.103 ? 1.913 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 321.688 ? 6.033 ns/op > ArrayClone.byteClone 0 avgt 15 227.602 ? 3.393 ns/op > ArrayClone.byteClone 10 avgt 15 237.624 ? 2.996 ns/op > ArrayClone.byteClone 100 avgt 15 239.219 ? 2.835 ns/op > > ArrayClone.byteClone 1000 avgt 15 355.571 ? 2.946 ns/op > ArrayClone.intArraycopy 0 avgt 15 113.275 ? 1.099 ns/op > ArrayClone.intArraycopy 10 avgt 15 129.763 ? 1.458 ns/op > ArrayClone.intArraycopy 100 avgt 15 213.327 ? 2.524 ns/op > ArrayClone.intArraycopy 1000 avgt 15 449.650 ? 7.338 ns/op > ArrayClone.intClone 0 avgt 15 225.682 ? 3.048 ns/op > ArrayClone.intClone 10 avgt 15 234.532 ? 2.817 ns/op > ArrayClone.intClone 100 avgt 15 295.934 ? 4.925 ns/op > ArrayClone.intClone 1000 avgt 15 573.368 ? 5.739 ns/op > Finished running test 'micro:java.lang.ArrayClone' > Test report is stored in build/aix-ppc64-server-release/test-results/micro_java_lang_ArrayClone > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > micro:java.lang.ArrayClone 1 1 0 0 > ============================== > TEST SUCCESS > > Finished building target 'test' in configuration 'aix-ppc64-server-release' > > > > > After: > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 113.894 ? 0.993 ns/op > ArrayClone.byteArraycopy 10 avgt 15 131.455 ? 0.956 ns/op > ArrayClone.byteArraycopy 100 avgt 15 139.145 ? 3.002 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 315.957 ? 14.591 ns/op > ArrayClone.byteClone 0 avgt 15 43.753 ? 3.669 ns/op > ArrayClone.byteClone 10 avgt 15 52.329 ? 1.041 ns/op > ArrayClone.byteClone 100 avgt 15 127.711 ? 3.938 ns/op > > ArrayClone.byteClone 1000 avgt 15 225.937 ? 1.987 ns/op > ArrayClone.intArraycopy 0 avgt 15 113.788 ? 0.770 ns/op > ArrayClone.intArraycopy 10 avgt 1... This looks good. Please adapt the indentation. You can mark it as ready for review. I got crashes when testing on linux ppc64le and noticed that we need one more adaptation to handle `stub == nullptr`. I suggest the following addition: diff --git a/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp b/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp index b6d9200b261..dba662a2212 100644 --- a/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp +++ b/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp @@ -1968,7 +1968,11 @@ void LIR_Assembler::emit_arraycopy(LIR_OpArrayCopy* op) { int shift = shift_amount(basic_type); if (!(flags & LIR_OpArrayCopy::type_check)) { - __ b(cont); + if (stub != nullptr) { + __ b(cont); + __ bind(slow); + __ b(*stub->entry()); + } } else { // We don't know the array types are compatible. if (basic_type != T_OBJECT) { @@ -2089,9 +2093,9 @@ void LIR_Assembler::emit_arraycopy(LIR_OpArrayCopy* op) { __ add(dst_pos, tmp, dst_pos); } } + __ bind(slow); + __ b(*stub->entry()); } - __ bind(slow); - __ b(*stub->entry()); __ bind(cont); #ifdef ASSERT The test failures will be fixed by https://github.com/openjdk/jdk/pull/19218. Unrelated. src/hotspot/cpu/ppc/c1_MacroAssembler_ppc.cpp line 366: > 364: } > 365: > 366: initialize_body(base, index); hotspot uses 2 spaces indentation. ------------- PR Review: https://git.openjdk.org/jdk/pull/19250#pullrequestreview-2058284260 Changes requested by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19250#pullrequestreview-2059189398 PR Comment: https://git.openjdk.org/jdk/pull/19250#issuecomment-2112770468 PR Review Comment: https://git.openjdk.org/jdk/pull/19250#discussion_r1601806637 From varadam at openjdk.org Thu May 16 09:17:20 2024 From: varadam at openjdk.org (Varada M) Date: Thu, 16 May 2024 09:17:20 GMT Subject: RFR: 8331935: Add support for primitive array C1 clone intrinsic in PPC Message-ID: https://bugs.openjdk.org/browse/JDK-8302850 port for PPC64 JMH Benchmark Results Before : Benchmark (size) Mode Cnt Score Error Units ArrayClone.byteArraycopy 0 avgt 15 114.107 ? 1.337 ns/op ArrayClone.byteArraycopy 10 avgt 15 130.492 ? 0.991 ns/op ArrayClone.byteArraycopy 100 avgt 15 139.103 ? 1.913 ns/op ArrayClone.byteArraycopy 1000 avgt 15 321.688 ? 6.033 ns/op ArrayClone.byteClone 0 avgt 15 227.602 ? 3.393 ns/op ArrayClone.byteClone 10 avgt 15 237.624 ? 2.996 ns/op ArrayClone.byteClone 100 avgt 15 239.219 ? 2.835 ns/op ArrayClone.byteClone 1000 avgt 15 355.571 ? 2.946 ns/op ArrayClone.intArraycopy 0 avgt 15 113.275 ? 1.099 ns/op ArrayClone.intArraycopy 10 avgt 15 129.763 ? 1.458 ns/op ArrayClone.intArraycopy 100 avgt 15 213.327 ? 2.524 ns/op ArrayClone.intArraycopy 1000 avgt 15 449.650 ? 7.338 ns/op ArrayClone.intClone 0 avgt 15 225.682 ? 3.048 ns/op ArrayClone.intClone 10 avgt 15 234.532 ? 2.817 ns/op ArrayClone.intClone 100 avgt 15 295.934 ? 4.925 ns/op ArrayClone.intClone 1000 avgt 15 573.368 ? 5.739 ns/op Finished running test 'micro:java.lang.ArrayClone' Test report is stored in build/aix-ppc64-server-release/test-results/micro_java_lang_ArrayClone ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR micro:java.lang.ArrayClone 1 1 0 0 ============================== TEST SUCCESS Finished building target 'test' in configuration 'aix-ppc64-server-release' After: Benchmark (size) Mode Cnt Score Error Units ArrayClone.byteArraycopy 0 avgt 15 113.894 ? 0.993 ns/op ArrayClone.byteArraycopy 10 avgt 15 131.455 ? 0.956 ns/op ArrayClone.byteArraycopy 100 avgt 15 139.145 ? 3.002 ns/op ArrayClone.byteArraycopy 1000 avgt 15 315.957 ? 14.591 ns/op ArrayClone.byteClone 0 avgt 15 43.753 ? 3.669 ns/op ArrayClone.byteClone 10 avgt 15 52.329 ? 1.041 ns/op ArrayClone.byteClone 100 avgt 15 127.711 ? 3.938 ns/op ArrayClone.byteClone 1000 avgt 15 225.937 ? 1.987 ns/op ArrayClone.intArraycopy 0 avgt 15 113.788 ? 0.770 ns/op ArrayClone.intArraycopy 10 avgt 15 131.980 ? 2.102 ns/op ArrayClone.intArraycopy 100 avgt 15 213.745 ? 2.615 ns/op ArrayClone.intArraycopy 1000 avgt 15 460.820 ? 7.106 ns/op ArrayClone.intClone 0 avgt 15 42.074 ? 0.547 ns/op ArrayClone.intClone 10 avgt 15 80.125 ? 1.735 ns/op ArrayClone.intClone 100 avgt 15 207.313 ? 2.717 ns/op ArrayClone.intClone 1000 avgt 15 326.546 ? 5.671 ns/op Finished running test 'micro:java.lang.ArrayClone' Test report is stored in build/aix-ppc64-server-release/test-results/micro_java_lang_ArrayClone ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR micro:java.lang.ArrayClone 1 1 0 0 ============================== TEST SUCCESS Hotspot compiler tests results : ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:hotspot_compiler 1170 1168 2 0 << ============================== TEST FAILURE 2 test failures shown here is not related to code change. It is present without this changes Reported Issue : [JDK-8331935](https://bugs.openjdk.org/browse/JDK-8331935) ------------- Commit messages: - Add support for primitive array C1 clone intrinsic - Add support for primitive array C1 clone intrinsic Changes: https://git.openjdk.org/jdk/pull/19250/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19250&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331935 Stats: 60 lines in 6 files changed: 28 ins; 2 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/19250.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19250/head:pull/19250 PR: https://git.openjdk.org/jdk/pull/19250 From varadam at openjdk.org Thu May 16 09:17:21 2024 From: varadam at openjdk.org (Varada M) Date: Thu, 16 May 2024 09:17:21 GMT Subject: RFR: 8331935: Add support for primitive array C1 clone intrinsic in PPC In-Reply-To: References: Message-ID: On Wed, 15 May 2024 13:50:27 GMT, Varada M wrote: > https://bugs.openjdk.org/browse/JDK-8302850 port for PPC64 > > JMH Benchmark Results > > > Before : > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 114.107 ? 1.337 ns/op > ArrayClone.byteArraycopy 10 avgt 15 130.492 ? 0.991 ns/op > ArrayClone.byteArraycopy 100 avgt 15 139.103 ? 1.913 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 321.688 ? 6.033 ns/op > ArrayClone.byteClone 0 avgt 15 227.602 ? 3.393 ns/op > ArrayClone.byteClone 10 avgt 15 237.624 ? 2.996 ns/op > ArrayClone.byteClone 100 avgt 15 239.219 ? 2.835 ns/op > > ArrayClone.byteClone 1000 avgt 15 355.571 ? 2.946 ns/op > ArrayClone.intArraycopy 0 avgt 15 113.275 ? 1.099 ns/op > ArrayClone.intArraycopy 10 avgt 15 129.763 ? 1.458 ns/op > ArrayClone.intArraycopy 100 avgt 15 213.327 ? 2.524 ns/op > ArrayClone.intArraycopy 1000 avgt 15 449.650 ? 7.338 ns/op > ArrayClone.intClone 0 avgt 15 225.682 ? 3.048 ns/op > ArrayClone.intClone 10 avgt 15 234.532 ? 2.817 ns/op > ArrayClone.intClone 100 avgt 15 295.934 ? 4.925 ns/op > ArrayClone.intClone 1000 avgt 15 573.368 ? 5.739 ns/op > Finished running test 'micro:java.lang.ArrayClone' > Test report is stored in build/aix-ppc64-server-release/test-results/micro_java_lang_ArrayClone > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > micro:java.lang.ArrayClone 1 1 0 0 > ============================== > TEST SUCCESS > > Finished building target 'test' in configuration 'aix-ppc64-server-release' > > > > > After: > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 113.894 ? 0.993 ns/op > ArrayClone.byteArraycopy 10 avgt 15 131.455 ? 0.956 ns/op > ArrayClone.byteArraycopy 100 avgt 15 139.145 ? 3.002 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 315.957 ? 14.591 ns/op > ArrayClone.byteClone 0 avgt 15 43.753 ? 3.669 ns/op > ArrayClone.byteClone 10 avgt 15 52.329 ? 1.041 ns/op > ArrayClone.byteClone 100 avgt 15 127.711 ? 3.938 ns/op > > ArrayClone.byteClone 1000 avgt 15 225.937 ? 1.987 ns/op > ArrayClone.intArraycopy 0 avgt 15 113.788 ? 0.770 ns/op > ArrayClone.intArraycopy 10 avgt 1... > I got crashes when testing on linux ppc64le and noticed that we need one more adaptation to handle `stub == nullptr`. I suggest the following addition: > > ```diff > diff --git a/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp b/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp > index b6d9200b261..dba662a2212 100644 > --- a/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp > +++ b/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp > @@ -1968,7 +1968,11 @@ void LIR_Assembler::emit_arraycopy(LIR_OpArrayCopy* op) { > int shift = shift_amount(basic_type); > > if (!(flags & LIR_OpArrayCopy::type_check)) { > - __ b(cont); > + if (stub != nullptr) { > + __ b(cont); > + __ bind(slow); > + __ b(*stub->entry()); > + } > } else { > // We don't know the array types are compatible. > if (basic_type != T_OBJECT) { > @@ -2089,9 +2093,9 @@ void LIR_Assembler::emit_arraycopy(LIR_OpArrayCopy* op) { > __ add(dst_pos, tmp, dst_pos); > } > } > + __ bind(slow); > + __ b(*stub->entry()); > } > - __ bind(slow); > - __ b(*stub->entry()); > __ bind(cont); > > #ifdef ASSERT > ``` Hi @TheRealMDoerr , I have applied the suggested changes and I have fixed the indentation fixes. Testing is also done. Thank you ------------- PR Comment: https://git.openjdk.org/jdk/pull/19250#issuecomment-2114639380 From bkilambi at openjdk.org Thu May 16 10:08:55 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 16 May 2024 10:08:55 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v9] In-Reply-To: References: Message-ID: > Floating-point addition is non-associative, that is adding floating-point elements in arbitrary order may get different value. Specially, Vector API does not define the order of reduction intentionally, which allows platforms to generate more efficient codes [1]. So that needs a node to represent non strictly-ordered add-reduction for floating-point type in C2. > > To avoid introducing new nodes, this patch adds a bool field in `AddReductionVF/D` to distinguish whether they require strict order. It also removes `UnorderedReductionNode` and adds a virtual function `bool requires_strict_order()` in `ReductionNode`. Besides `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` have a fixed value. > > With this patch, Vector API would always generate non strictly-ordered `AddReductionVF/D' on SVE machines with vector length <= 16B as it is more beneficial to generate non-strictly ordered instructions on such machines compared to strictly ordered ones. > > [AArch64] > On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. Auto-vectorization has already banned these nodes in JDK-8275275 [2]. > > This patch adds matching rules for non strictly-ordered `AddReductionVF/D`. > > No effects on other platforms. > > [Performance] > FloatMaxVector.ADDLanes [3] measures the performance of add reduction for floating-point type. With this patch, it improves ~3x on my SVE machine (128-bit). > > ADDLanes > > Benchmark Before After Unit > FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms > > > Final code is as below: > > Before: > ` fadda z17.s, p7/m, z17.s, z16.s > ` > After: > > faddp v17.4s, v21.4s, v21.4s > faddp s18, v17.2s > fadd s18, s18, s19 > > > > > [Test] > Full jtreg passed on AArch64 and x86. > > [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2529 > [2] https://bugs.openjdk.org/browse/JDK-8275275 > [3] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L316 Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: Add dump_spec and JTREG IR tests for Add/Mul Reduction Nodes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18034/files - new: https://git.openjdk.org/jdk/pull/18034/files/bdd0fabf..3afde82c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18034&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18034&range=07-08 Stats: 322 lines in 5 files changed: 294 ins; 6 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/18034.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18034/head:pull/18034 PR: https://git.openjdk.org/jdk/pull/18034 From bkilambi at openjdk.org Thu May 16 10:12:13 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 16 May 2024 10:12:13 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v8] In-Reply-To: References: <8-_t7nWbR9gZ2_QkfFNuf5M0Q4PMkKJKgwS3ZbHcCxI=.32dc4f11-dec5-468d-afc8-3b4dae285dcb@github.com> <2y-Ag6MxVDJfYl6kM0FYjQA-kzSCekUgAMWAZmkECyQ=.2a2a0a8e-fc67-42a4-bd67-b4ae3b60bcea@github.com> Message-ID: On Mon, 13 May 2024 11:01:30 GMT, Emanuel Peter wrote: >> @eme64 Thanks for the clarification. I understand the usage of `counts` in the IR tests. Just that I got a bit confused by some of your earlier statements. We do actually have a test to make sure AddReductionVF/VD and MulReductionVF/VD are not generated on aarch64 NEON machines - `test/hotspot/jtreg/compiler/c2/irTests/TestDisableAutoVectOpcodes.java`. I can modify this test to include UseSVE > 0 case as well and will also add a separate JTREG test for the VectorAPI tests. Hope that's ok.. > > @Bhavana-Kilambi > I know we have the tests in `test/hotspot/jtreg/compiler/c2/irTests/TestDisableAutoVectOpcodes.java`, and some other reduction tests. But these do not do the specific think I would like to see. > > I would like this: > - Add `no_strict_order` vs `requires_strict_order` or similar to `dump_spec`. > - IR match not just that there is the correct `ReductionNode`, but also that it has the `no_strict_order` or `requires_strict_order` in its dump. You can do that by using a custom regex string, rather than `IRNode.STORE_VECTOR` or similar. > - Then, create different tests, some where we expect ordered, some unordered vectors. Use Vector API and SuperWord examples. > > Does that make sense? Hi @eme64 , I have added the dump_spec as suggested and also two JTREG IR tests for superword and vectorapi. I have not modified the existing tests for superword and created a separate test instead as it might be easier to extend these tests for other platforms in the future if needed. Thanks for your suggestions. Please do review and let me know if any changes are required. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18034#issuecomment-2114782520 From duke at openjdk.org Thu May 16 10:39:09 2024 From: duke at openjdk.org (ExE Boss) Date: Thu, 16 May 2024 10:39:09 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v10] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: <6imEhvO6pddKDrwPo8m6X1vyClmUHgDU8NNvWllwmS8=.893489ed-9270-4b57-a3e3-5c81ac4f4c03@github.com> On Thu, 16 May 2024 09:01:22 GMT, Per Minborg wrote: >> # Stable Values & Collections (Internal) >> >> ## Summary >> This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. >> >> ## Goals >> * Provide an easy and intuitive API to describe value holders that can change at most once. >> * Decouple declaration from initialization without significant footprint or performance penalties. >> * Reduce the amount of static initializer and/or field initialization code. >> * Uphold integrity and consistency, even in a multi-threaded environment. >> >> For more details, see the draft JEP: https://openjdk.org/jeps/8312611 >> >> ## Performance >> Performance compared to instance variables using two `AtomicReference` and two protected by double-checked locking under concurrent access by all threads: >> >> >> Benchmark Mode Cnt Score Error Units >> StableBenchmark.atomic thrpt 10 259.478 ? 36.809 ops/us >> StableBenchmark.dcl thrpt 10 225.710 ? 26.638 ops/us >> StableBenchmark.stable thrpt 10 4382.478 ? 1151.472 ops/us <- StableValue significantly faster >> >> >> Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (all threads): >> >> >> Benchmark Mode Cnt Score Error Units >> StableStaticBenchmark.atomic thrpt 10 6487.835 ? 385.639 ops/us >> StableStaticBenchmark.dcl thrpt 10 6605.239 ? 210.610 ops/us >> StableStaticBenchmark.stable thrpt 10 14338.239 ? 1426.874 ops/us >> StableStaticBenchmark.staticCHI thrpt 10 13780.341 ? 1839.651 ops/us >> >> >> Performance for stable lists (thread safe) in both instance and static contexts whereby we access a single value compared to `ArrayList` instances (which are not thread-safe) (all threads): >> >> >> Benchmark Mode Cnt Score Error Units >> StableListElementBenchmark.instanceArrayList thrpt 10 5812.992 ? 1169.730 ops/us >> StableListElementBenchmark.instanceList thrpt 10 4818.643 ? 704.893 ops/us >> StableListElementBenchmark... > > Per Minborg has updated the pull request incrementally with one additional commit since the last revision: > > Use byte for storing state and compute flags src/java.base/share/classes/jdk/internal/lang/stable/StableAccess.java line 63: > 61: Function original) { > 62: return new MemoizedFunction<>(stableMap, original); > 63: } Maybe?also rename?these? Suggestion: public static Supplier memoizedSupplier(StableValue stable, Supplier original) { return new MemoizedSupplier<>(stable, original); } public static IntFunction memoizedIntFunction(List> stableList, IntFunction original) { return new MemoizedIntFunction<>(stableList, original); } public static Function memoizedFunction(Map> stableMap, Function original) { return new MemoizedFunction<>(stableMap, original); } src/jdk.unsupported/share/classes/sun/misc/Unsafe.java line 729: > 727: } > 728: } > 729: } Given?that the?[`Class?::forName?(String, boolean, ClassLoader)`] method doesn?t?care about?whether the?requested?class is?actually exported to?the?caller, it?s?possible to?do the?following: Suggestion: final class Holder { static final Class TRUSTED_FIELD_TYPE; static { PrivilegedAction getPlatformClassLoader = ClassLoader::getPlatformClassLoader; @SuppressWarnings("removal") ClassLoader platformClassLoader = AccessController.doPrivileged(getPlatformClassLoader); try { TRUSTED_FIELD_TYPE = Class.forName("jdk.internal.lang.stable.TrustedFieldType", false, platformClassLoader); } catch (ClassNotFoundException e) { throw new AssertionError(e); } } } Class declaringClass = f.getDeclaringClass(); if (declaringClass.isHidden()) { throw new UnsupportedOperationException("can't get base address on a hidden class: " + f); } if (declaringClass.isRecord()) { throw new UnsupportedOperationException("can't get base address on a record class: " + f); } Class fieldType = f.getType(); if (Holder.TRUSTED_FIELD_TYPE.isAssignableFrom(fieldType)) { throw new UnsupportedOperationException("can't get field offset for a field of type " + fieldType.getName() + ": " + f); } [`Class?::forName?(String, boolean, ClassLoader)`]: https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/Class.html#forName(java.lang.String,boolean,java.lang.ClassLoader) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1603032451 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1603089512 From pminborg at openjdk.org Thu May 16 11:00:38 2024 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 16 May 2024 11:00:38 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v11] In-Reply-To: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: <5nBuqaRQH0BuxaPOWhrfxaizJLcM8fEDAkoI6sDwzNg=.ee090add-1c7e-4349-b9e3-114acd98a663@github.com> > # Stable Values & Collections (Internal) > > ## Summary > This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. > > ## Goals > * Provide an easy and intuitive API to describe value holders that can change at most once. > * Decouple declaration from initialization without significant footprint or performance penalties. > * Reduce the amount of static initializer and/or field initialization code. > * Uphold integrity and consistency, even in a multi-threaded environment. > > For more details, see the draft JEP: https://openjdk.org/jeps/8312611 > > ## Performance > Performance compared to instance variables using two `AtomicReference` and two protected by double-checked locking under concurrent access by all threads: > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.atomic thrpt 10 259.478 ? 36.809 ops/us > StableBenchmark.dcl thrpt 10 225.710 ? 26.638 ops/us > StableBenchmark.stable thrpt 10 4382.478 ? 1151.472 ops/us <- StableValue significantly faster > > > Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (all threads): > > > Benchmark Mode Cnt Score Error Units > StableStaticBenchmark.atomic thrpt 10 6487.835 ? 385.639 ops/us > StableStaticBenchmark.dcl thrpt 10 6605.239 ? 210.610 ops/us > StableStaticBenchmark.stable thrpt 10 14338.239 ? 1426.874 ops/us > StableStaticBenchmark.staticCHI thrpt 10 13780.341 ? 1839.651 ops/us > > > Performance for stable lists (thread safe) in both instance and static contexts whereby we access a single value compared to `ArrayList` instances (which are not thread-safe) (all threads): > > > Benchmark Mode Cnt Score Error Units > StableListElementBenchmark.instanceArrayList thrpt 10 5812.992 ? 1169.730 ops/us > StableListElementBenchmark.instanceList thrpt 10 4818.643 ? 704.893 ops/us > StableListElementBenchmark.staticArrayList thrpt 10 7614.741 ? 564.777 ops/us > StableListElementBe... Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Update src/java.base/share/classes/jdk/internal/lang/stable/StableAccess.java Co-authored-by: ExE Boss <3889017+ExE-Boss at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18794/files - new: https://git.openjdk.org/jdk/pull/18794/files/80b7e081..923e1877 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=09-10 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/18794.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18794/head:pull/18794 PR: https://git.openjdk.org/jdk/pull/18794 From mli at openjdk.org Thu May 16 11:16:12 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 16 May 2024 11:16:12 GMT Subject: RFR: 8332153: RISC-V: enable tests and add comment for vector shift instruct (shared by vectorization and Vector API) Message-ID: Hi, Can you help to review this patch? For vector shift instruct, some corresponding tests are not enabled, this is to enable them. And the way how vector shift instruct works is not clear, especially both vectorization (SLP in jdk) and Vector API share the same instruct's in riscv_v.ad, so also added some comment to clarify it. Thanks ------------- Commit messages: - add comment - enable tests Changes: https://git.openjdk.org/jdk/pull/19265/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19265&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332153 Stats: 178 lines in 11 files changed: 174 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19265.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19265/head:pull/19265 PR: https://git.openjdk.org/jdk/pull/19265 From liach at openjdk.org Thu May 16 11:18:10 2024 From: liach at openjdk.org (Chen Liang) Date: Thu, 16 May 2024 11:18:10 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v5] In-Reply-To: <5zoKgO17_hUh9-UveP-yo82Sh2Jrk2Z_3K8rsarDk10=.03e40aaa-cac7-4ef6-b9ca-131fd338a0a8@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> <5zoKgO17_hUh9-UveP-yo82Sh2Jrk2Z_3K8rsarDk10=.03e40aaa-cac7-4ef6-b9ca-131fd338a0a8@github.com> Message-ID: On Thu, 16 May 2024 07:11:20 GMT, Per Minborg wrote: >> src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 139: >> >>> 137: case NON_NULL: { return valueVolatile(); } >>> 138: case ERROR: { throw StableUtil.error(this); } >>> 139: case DUMMY: { throw shouldNotReachHere(); } >> >> Redundant branch? > > The idea here is to have the most likely value in the middle... Not sure if that motivates the added complexity though. Is there any refernce on how/why the middle entry in a tableswitch instruction is the fastest? >> src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 236: >> >>> 234: } catch (Throwable t) { >>> 235: putState(ERROR); >>> 236: putMutex(t.getClass()); >> >> Should we cache the exception instance so we can rethrow it in future ERROR state `orThrow` calls? > > We considered recording the entire exception instance but for security reasons, we ended up just recording the type of exception. I will add a comment explaining this in the code. Thanks for this clarification. Makes sense. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1603149806 PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1603150592 From liach at openjdk.org Thu May 16 11:18:08 2024 From: liach at openjdk.org (Chen Liang) Date: Thu, 16 May 2024 11:18:08 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v5] In-Reply-To: <_G7nE_OAMl9WSkAz21UDYHAlCRVeHN2ZmM0FR7Bmxtw=.ea94e982-3e9a-4c0e-8523-11372474d497@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> <_G7nE_OAMl9WSkAz21UDYHAlCRVeHN2ZmM0FR7Bmxtw=.ea94e982-3e9a-4c0e-8523-11372474d497@github.com> Message-ID: <_I8ZxFiUZ3bygkA-iDH29xEmkqoy0Dm-_g0iAVRkoro=.680467aa-a061-4f17-8fcd-a60a829afa59@github.com> On Thu, 16 May 2024 06:54:26 GMT, Per Minborg wrote: >> Maybe the?`state?==?NULL` check should?be?moved before?`v?!=?null`, as?the?**JIT** doesn?t?constant?fold `null`?[`@Stable`]?values: >> https://github.com/openjdk/jdk/blob/8a4315f833f3700075d65fae6bc566011c837c07/src/java.base/share/classes/jdk/internal/vm/annotation/Stable.java#L41-L44 https://github.com/openjdk/jdk/blob/8a4315f833f3700075d65fae6bc566011c837c07/src/java.base/share/classes/jdk/internal/vm/annotation/Stable.java#L64-L71 >> >> [`@Stable`]: https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/jdk/internal/vm/annotation/Stable.java > > It seems reasonable to assume `null` values are not constant-folded. For straight-out-of-the-box usage, there is no apparent significant difference as indicated by a new benchmark I just added: > > > Benchmark Mode Cnt Score Error Units > StableStaticBenchmark.atomic thrpt 10 5729.683 ? 502.023 ops/us > StableStaticBenchmark.dcl thrpt 10 6069.222 ? 951.784 ops/us > StableStaticBenchmark.dclHolder thrpt 10 5502.102 ? 1630.627 ops/us > StableStaticBenchmark.stable thrpt 10 12737.158 ? 1746.456 ops/us <- Non-null benchmark > StableStaticBenchmark.stableHolder thrpt 10 12053.978 ? 1421.527 ops/us > StableStaticBenchmark.stableList thrpt 10 12443.870 ? 2084.607 ops/us > StableStaticBenchmark.stableNull thrpt 10 13164.232 ? 591.284 ops/us <- Added null benchmark > StableStaticBenchmark.stableRecordHolder thrpt 10 13638.893 ? 1250.895 ops/us > StableStaticBenchmark.staticCHI thrpt 10 13639.220 ? 1190.922 ops/us > > > If the `null` value participates in a much larger constant-folding tree, there might be a significant difference. I am afraid moving the order would have detrimental effects on instance performance: > > Checking value first: > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.atomic thrpt 10 246.460 ? 75.417 ops/us > StableBenchmark.dcl thrpt 10 243.481 ? 35.021 ops/us > StableBenchmark.stable thrpt 10 4977.693 ? 675.926 ops/us <- Non-null > StableBenchmark.stableHoldingList thrpt 10 3614.460 ? 275.140 ops/us > StableBenchmark.stableList thrpt 10 3328.155 ? 898.202 ops/us > StableBenchmark.stableListStored thrpt 10 3842.174 ? 535.902 ops/us > StableBenchmark.stableNull thrpt 10 6217.737 ? 840.376 ops/us <- null > StableBenchmark.supplier thrpt 10 9369.934 ? 1449.182 ops/us > > > Checking null first: > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.atomic thrpt 10 287.640 ? 17.858 ops/us > StableBenchmark.dcl thrpt 10 250.398 ? 20.874 ops/us > StableBenchmark.stable thrpt 10 3745.885 ? 1040.534 ops/us <- Non-null > StableBenchmark.stableHoldingList thrpt 10 2982.129 ? 503.492 ops/us > StableBenchmark.stableList thrpt 10 3125.045 ? 416.792 ops/us > StableBenchmark.sta... I think the result would be more convincing if the stable case is changed to `sum += (stable.orThrow() == null ? 0 : 1) + (stable2.orThrow() == null ? 0 : 1);` as adding by 1 might be somewhat better optimized by JIT. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1603148915 From mdoerr at openjdk.org Thu May 16 11:37:01 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 16 May 2024 11:37:01 GMT Subject: RFR: 8331935: Add support for primitive array C1 clone intrinsic in PPC In-Reply-To: References: Message-ID: On Wed, 15 May 2024 13:50:27 GMT, Varada M wrote: > https://bugs.openjdk.org/browse/JDK-8302850 port for PPC64 > > JMH Benchmark Results > > > Before : > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 114.107 ? 1.337 ns/op > ArrayClone.byteArraycopy 10 avgt 15 130.492 ? 0.991 ns/op > ArrayClone.byteArraycopy 100 avgt 15 139.103 ? 1.913 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 321.688 ? 6.033 ns/op > ArrayClone.byteClone 0 avgt 15 227.602 ? 3.393 ns/op > ArrayClone.byteClone 10 avgt 15 237.624 ? 2.996 ns/op > ArrayClone.byteClone 100 avgt 15 239.219 ? 2.835 ns/op > > ArrayClone.byteClone 1000 avgt 15 355.571 ? 2.946 ns/op > ArrayClone.intArraycopy 0 avgt 15 113.275 ? 1.099 ns/op > ArrayClone.intArraycopy 10 avgt 15 129.763 ? 1.458 ns/op > ArrayClone.intArraycopy 100 avgt 15 213.327 ? 2.524 ns/op > ArrayClone.intArraycopy 1000 avgt 15 449.650 ? 7.338 ns/op > ArrayClone.intClone 0 avgt 15 225.682 ? 3.048 ns/op > ArrayClone.intClone 10 avgt 15 234.532 ? 2.817 ns/op > ArrayClone.intClone 100 avgt 15 295.934 ? 4.925 ns/op > ArrayClone.intClone 1000 avgt 15 573.368 ? 5.739 ns/op > Finished running test 'micro:java.lang.ArrayClone' > Test report is stored in build/aix-ppc64-server-release/test-results/micro_java_lang_ArrayClone > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > micro:java.lang.ArrayClone 1 1 0 0 > ============================== > TEST SUCCESS > > Finished building target 'test' in configuration 'aix-ppc64-server-release' > > > > > After: > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 113.894 ? 0.993 ns/op > ArrayClone.byteArraycopy 10 avgt 15 131.455 ? 0.956 ns/op > ArrayClone.byteArraycopy 100 avgt 15 139.145 ? 3.002 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 315.957 ? 14.591 ns/op > ArrayClone.byteClone 0 avgt 15 43.753 ? 3.669 ns/op > ArrayClone.byteClone 10 avgt 15 52.329 ? 1.041 ns/op > ArrayClone.byteClone 100 avgt 15 127.711 ? 3.938 ns/op > > ArrayClone.byteClone 1000 avgt 15 225.937 ? 1.987 ns/op > ArrayClone.intArraycopy 0 avgt 15 113.788 ? 0.770 ns/op > ArrayClone.intArraycopy 10 avgt 1... LGTM. I'll rerun tests. Please ask somebody from your team to do a 2nd review. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19250#pullrequestreview-2060477911 From varadam at openjdk.org Thu May 16 11:41:04 2024 From: varadam at openjdk.org (Varada M) Date: Thu, 16 May 2024 11:41:04 GMT Subject: RFR: 8331935: Add support for primitive array C1 clone intrinsic in PPC In-Reply-To: References: Message-ID: <4VwNK0QhqH99g18E8nFpbippB_YW1OyfVXjpaQ5Wsd0=.650bca48-7a0e-45da-aa20-39249956b329@github.com> On Wed, 15 May 2024 14:56:22 GMT, Martin Doerr wrote: >> https://bugs.openjdk.org/browse/JDK-8302850 port for PPC64 >> >> JMH Benchmark Results >> >> >> Before : >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 114.107 ? 1.337 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 130.492 ? 0.991 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 139.103 ? 1.913 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 321.688 ? 6.033 ns/op >> ArrayClone.byteClone 0 avgt 15 227.602 ? 3.393 ns/op >> ArrayClone.byteClone 10 avgt 15 237.624 ? 2.996 ns/op >> ArrayClone.byteClone 100 avgt 15 239.219 ? 2.835 ns/op >> >> ArrayClone.byteClone 1000 avgt 15 355.571 ? 2.946 ns/op >> ArrayClone.intArraycopy 0 avgt 15 113.275 ? 1.099 ns/op >> ArrayClone.intArraycopy 10 avgt 15 129.763 ? 1.458 ns/op >> ArrayClone.intArraycopy 100 avgt 15 213.327 ? 2.524 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 449.650 ? 7.338 ns/op >> ArrayClone.intClone 0 avgt 15 225.682 ? 3.048 ns/op >> ArrayClone.intClone 10 avgt 15 234.532 ? 2.817 ns/op >> ArrayClone.intClone 100 avgt 15 295.934 ? 4.925 ns/op >> ArrayClone.intClone 1000 avgt 15 573.368 ? 5.739 ns/op >> Finished running test 'micro:java.lang.ArrayClone' >> Test report is stored in build/aix-ppc64-server-release/test-results/micro_java_lang_ArrayClone >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> micro:java.lang.ArrayClone 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> Finished building target 'test' in configuration 'aix-ppc64-server-release' >> >> >> >> >> After: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 113.894 ? 0.993 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 131.455 ? 0.956 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 139.145 ? 3.002 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 315.957 ? 14.591 ns/op >> ArrayClone.byteClone 0 avgt 15 43.753 ? 3.669 ns/op >> ArrayClone.byteClone 10 avgt 15 52.329 ? 1.041 ns/op >> ArrayClone.byteClone 100 avgt 15 127.711 ? 3.938 ns/op >> >> ArrayClone.byteClone 1000 avgt 15 225.937 ? 1.987 ns/op >> Arr... > > The test failures will be fixed by https://github.com/openjdk/jdk/pull/19218. Unrelated. Thanks @TheRealMDoerr Hi @offamitkumar, Could you please review the code? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19250#issuecomment-2114993789 From pminborg at openjdk.org Thu May 16 11:49:38 2024 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 16 May 2024 11:49:38 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v12] In-Reply-To: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: <8fJZzyeDlqp-WbZbpBmY40bCGbDj5j31IXaeHgK4FHA=.f648fde3-2819-4cc4-9e11-cbf02520d100@github.com> > # Stable Values & Collections (Internal) > > ## Summary > This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. > > ## Goals > * Provide an easy and intuitive API to describe value holders that can change at most once. > * Decouple declaration from initialization without significant footprint or performance penalties. > * Reduce the amount of static initializer and/or field initialization code. > * Uphold integrity and consistency, even in a multi-threaded environment. > > For more details, see the draft JEP: https://openjdk.org/jeps/8312611 > > ## Performance > Performance compared to instance variables using two `AtomicReference` and two protected by double-checked locking under concurrent access by all threads: > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.atomic thrpt 10 259.478 ? 36.809 ops/us > StableBenchmark.dcl thrpt 10 225.710 ? 26.638 ops/us > StableBenchmark.stable thrpt 10 4382.478 ? 1151.472 ops/us <- StableValue significantly faster > > > Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (all threads): > > > Benchmark Mode Cnt Score Error Units > StableStaticBenchmark.atomic thrpt 10 6487.835 ? 385.639 ops/us > StableStaticBenchmark.dcl thrpt 10 6605.239 ? 210.610 ops/us > StableStaticBenchmark.stable thrpt 10 14338.239 ? 1426.874 ops/us > StableStaticBenchmark.staticCHI thrpt 10 13780.341 ? 1839.651 ops/us > > > Performance for stable lists (thread safe) in both instance and static contexts whereby we access a single value compared to `ArrayList` instances (which are not thread-safe) (all threads): > > > Benchmark Mode Cnt Score Error Units > StableListElementBenchmark.instanceArrayList thrpt 10 5812.992 ? 1169.730 ops/us > StableListElementBenchmark.instanceList thrpt 10 4818.643 ? 704.893 ops/us > StableListElementBenchmark.staticArrayList thrpt 10 7614.741 ? 564.777 ops/us > StableListElementBe... Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Clean up ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18794/files - new: https://git.openjdk.org/jdk/pull/18794/files/923e1877..b40ebfad Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=10-11 Stats: 64 lines in 3 files changed: 14 ins; 21 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/18794.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18794/head:pull/18794 PR: https://git.openjdk.org/jdk/pull/18794 From liach at openjdk.org Thu May 16 11:58:13 2024 From: liach at openjdk.org (Chen Liang) Date: Thu, 16 May 2024 11:58:13 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v6] In-Reply-To: <53bNF9v46bu-RgX1vJOwtdFKIzP3vwieCOENWtg2ra8=.c3f4dca8-6b1f-4c35-8e2c-f142c05dfe9b@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> <53bNF9v46bu-RgX1vJOwtdFKIzP3vwieCOENWtg2ra8=.c3f4dca8-6b1f-4c35-8e2c-f142c05dfe9b@github.com> Message-ID: On Thu, 16 May 2024 07:29:21 GMT, Per Minborg wrote: >> # Stable Values & Collections (Internal) >> >> ## Summary >> This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. >> >> ## Goals >> * Provide an easy and intuitive API to describe value holders that can change at most once. >> * Decouple declaration from initialization without significant footprint or performance penalties. >> * Reduce the amount of static initializer and/or field initialization code. >> * Uphold integrity and consistency, even in a multi-threaded environment. >> >> For more details, see the draft JEP: https://openjdk.org/jeps/8312611 >> >> ## Performance >> Performance compared to instance variables using two `AtomicReference` and two protected by double-checked locking under concurrent access by all threads: >> >> >> Benchmark Mode Cnt Score Error Units >> StableBenchmark.atomic thrpt 10 259.478 ? 36.809 ops/us >> StableBenchmark.dcl thrpt 10 225.710 ? 26.638 ops/us >> StableBenchmark.stable thrpt 10 4382.478 ? 1151.472 ops/us <- StableValue significantly faster >> >> >> Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (all threads): >> >> >> Benchmark Mode Cnt Score Error Units >> StableStaticBenchmark.atomic thrpt 10 6487.835 ? 385.639 ops/us >> StableStaticBenchmark.dcl thrpt 10 6605.239 ? 210.610 ops/us >> StableStaticBenchmark.stable thrpt 10 14338.239 ? 1426.874 ops/us >> StableStaticBenchmark.staticCHI thrpt 10 13780.341 ? 1839.651 ops/us >> >> >> Performance for stable lists (thread safe) in both instance and static contexts whereby we access a single value compared to `ArrayList` instances (which are not thread-safe) (all threads): >> >> >> Benchmark Mode Cnt Score Error Units >> StableListElementBenchmark.instanceArrayList thrpt 10 5812.992 ? 1169.730 ops/us >> StableListElementBenchmark.instanceList thrpt 10 4818.643 ? 704.893 ops/us >> StableListElementBenchmark... > > Per Minborg has updated the pull request incrementally with one additional commit since the last revision: > > Simplify exception handling and add benchmarks src/java.base/share/classes/jdk/internal/lang/stable/StableValueImpl.java line 400: > 398: @Override > 399: public void run() { > 400: stable.computeIfUnset(supplier); We can just declare this runnable as a capturing lambda (or an anonymous class if you fear initialization issues) and leave this comment there. The thread field can be removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1603177748 From pminborg at openjdk.org Thu May 16 11:58:11 2024 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 16 May 2024 11:58:11 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v5] In-Reply-To: <_I8ZxFiUZ3bygkA-iDH29xEmkqoy0Dm-_g0iAVRkoro=.680467aa-a061-4f17-8fcd-a60a829afa59@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> <_G7nE_OAMl9WSkAz21UDYHAlCRVeHN2ZmM0FR7Bmxtw=.ea94e982-3e9a-4c0e-8523-11372474d497@github.com> <_I8ZxFiUZ3bygkA-iDH29xEmkqoy0Dm-_g0iAVRkoro=.680467aa-a061-4f17-8fcd-a60a829afa59@github.com> Message-ID: On Thu, 16 May 2024 11:13:24 GMT, Chen Liang wrote: >> It seems reasonable to assume `null` values are not constant-folded. For straight-out-of-the-box usage, there is no apparent significant difference as indicated by a new benchmark I just added: >> >> >> Benchmark Mode Cnt Score Error Units >> StableStaticBenchmark.atomic thrpt 10 5729.683 ? 502.023 ops/us >> StableStaticBenchmark.dcl thrpt 10 6069.222 ? 951.784 ops/us >> StableStaticBenchmark.dclHolder thrpt 10 5502.102 ? 1630.627 ops/us >> StableStaticBenchmark.stable thrpt 10 12737.158 ? 1746.456 ops/us <- Non-null benchmark >> StableStaticBenchmark.stableHolder thrpt 10 12053.978 ? 1421.527 ops/us >> StableStaticBenchmark.stableList thrpt 10 12443.870 ? 2084.607 ops/us >> StableStaticBenchmark.stableNull thrpt 10 13164.232 ? 591.284 ops/us <- Added null benchmark >> StableStaticBenchmark.stableRecordHolder thrpt 10 13638.893 ? 1250.895 ops/us >> StableStaticBenchmark.staticCHI thrpt 10 13639.220 ? 1190.922 ops/us >> >> >> If the `null` value participates in a much larger constant-folding tree, there might be a significant difference. I am afraid moving the order would have detrimental effects on instance performance: >> >> Checking value first: >> >> >> Benchmark Mode Cnt Score Error Units >> StableBenchmark.atomic thrpt 10 246.460 ? 75.417 ops/us >> StableBenchmark.dcl thrpt 10 243.481 ? 35.021 ops/us >> StableBenchmark.stable thrpt 10 4977.693 ? 675.926 ops/us <- Non-null >> StableBenchmark.stableHoldingList thrpt 10 3614.460 ? 275.140 ops/us >> StableBenchmark.stableList thrpt 10 3328.155 ? 898.202 ops/us >> StableBenchmark.stableListStored thrpt 10 3842.174 ? 535.902 ops/us >> StableBenchmark.stableNull thrpt 10 6217.737 ? 840.376 ops/us <- null >> StableBenchmark.supplier thrpt 10 9369.934 ? 1449.182 ops/us >> >> >> Checking null first: >> >> >> Benchmark Mode Cnt Score Error Units >> StableBenchmark.atomic thrpt 10 287.640 ? 17.858 ops/us >> StableBenchmark.dcl thrpt 10 250.398 ? 20.874 ops/us >> StableBenchmark.stable thrpt 10 3745.885 ? 1040.534 ops/us <- Non-null >> StableBenchmark.stableHoldingList thrpt 10 2982.129 ? 503.492 ops/us >> StableBenchmar... > > I think the result would be more convincing if the stable case is changed to `sum += (stable.orThrow() == null ? 0 : 1) + (stable2.orThrow() == null ? 0 : 1);` as adding by 1 might be somewhat better optimized by JIT. I have instead changed parts of the `stableNull()` body to: sum += (stableNull.orThrow() == null ? VALUE : VALUE2) + (stableNull2.orThrow() == null ? VALUE : VALUE2); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1603205135 From pminborg at openjdk.org Thu May 16 12:06:08 2024 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 16 May 2024 12:06:08 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v5] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> <5zoKgO17_hUh9-UveP-yo82Sh2Jrk2Z_3K8rsarDk10=.03e40aaa-cac7-4ef6-b9ca-131fd338a0a8@github.com> Message-ID: On Thu, 16 May 2024 11:14:16 GMT, Chen Liang wrote: >> The idea here is to have the most likely value in the middle... Not sure if that motivates the added complexity though. > > Is there any refernce on how/why the middle entry in a tableswitch instruction is the fastest? It is only in a _lookupswitch_ that this becomes relevant. The above code will generate a *tableswitch* so I think it is safe to simplify the code and remove the DUMMY. private V orThrowVolatile(); descriptor: ()Ljava/lang/Object; flags: (0x0002) ACC_PRIVATE Code: stack=1, locals=1, args_size=1 0: aload_0 1: invokevirtual #15 // Method stateVolatile:()I 4: tableswitch { // 0 to 4 0: 40 1: 44 2: 46 3: 51 4: 56 default: 60 } ... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1603214742 From pminborg at openjdk.org Thu May 16 12:11:50 2024 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 16 May 2024 12:11:50 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v13] In-Reply-To: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: > # Stable Values & Collections (Internal) > > ## Summary > This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. > > ## Goals > * Provide an easy and intuitive API to describe value holders that can change at most once. > * Decouple declaration from initialization without significant footprint or performance penalties. > * Reduce the amount of static initializer and/or field initialization code. > * Uphold integrity and consistency, even in a multi-threaded environment. > > For more details, see the draft JEP: https://openjdk.org/jeps/8312611 > > ## Performance > Performance compared to instance variables using two `AtomicReference` and two protected by double-checked locking under concurrent access by all threads: > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.atomic thrpt 10 259.478 ? 36.809 ops/us > StableBenchmark.dcl thrpt 10 225.710 ? 26.638 ops/us > StableBenchmark.stable thrpt 10 4382.478 ? 1151.472 ops/us <- StableValue significantly faster > > > Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (all threads): > > > Benchmark Mode Cnt Score Error Units > StableStaticBenchmark.atomic thrpt 10 6487.835 ? 385.639 ops/us > StableStaticBenchmark.dcl thrpt 10 6605.239 ? 210.610 ops/us > StableStaticBenchmark.stable thrpt 10 14338.239 ? 1426.874 ops/us > StableStaticBenchmark.staticCHI thrpt 10 13780.341 ? 1839.651 ops/us > > > Performance for stable lists (thread safe) in both instance and static contexts whereby we access a single value compared to `ArrayList` instances (which are not thread-safe) (all threads): > > > Benchmark Mode Cnt Score Error Units > StableListElementBenchmark.instanceArrayList thrpt 10 5812.992 ? 1169.730 ops/us > StableListElementBenchmark.instanceList thrpt 10 4818.643 ? 704.893 ops/us > StableListElementBenchmark.staticArrayList thrpt 10 7614.741 ? 564.777 ops/us > StableListElementBe... Per Minborg has updated the pull request incrementally with two additional commits since the last revision: - Cleanup switch rakes - Update null benchmarks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18794/files - new: https://git.openjdk.org/jdk/pull/18794/files/b40ebfad..dbbefea5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=11-12 Stats: 19 lines in 4 files changed: 3 ins; 6 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/18794.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18794/head:pull/18794 PR: https://git.openjdk.org/jdk/pull/18794 From pminborg at openjdk.org Thu May 16 12:28:44 2024 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 16 May 2024 12:28:44 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v14] In-Reply-To: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: > # Stable Values & Collections (Internal) > > ## Summary > This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. > > ## Goals > * Provide an easy and intuitive API to describe value holders that can change at most once. > * Decouple declaration from initialization without significant footprint or performance penalties. > * Reduce the amount of static initializer and/or field initialization code. > * Uphold integrity and consistency, even in a multi-threaded environment. > > For more details, see the draft JEP: https://openjdk.org/jeps/8312611 > > ## Performance > Performance compared to instance variables using two `AtomicReference` and two protected by double-checked locking under concurrent access by all threads: > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.atomic thrpt 10 259.478 ? 36.809 ops/us > StableBenchmark.dcl thrpt 10 225.710 ? 26.638 ops/us > StableBenchmark.stable thrpt 10 4382.478 ? 1151.472 ops/us <- StableValue significantly faster > > > Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (all threads): > > > Benchmark Mode Cnt Score Error Units > StableStaticBenchmark.atomic thrpt 10 6487.835 ? 385.639 ops/us > StableStaticBenchmark.dcl thrpt 10 6605.239 ? 210.610 ops/us > StableStaticBenchmark.stable thrpt 10 14338.239 ? 1426.874 ops/us > StableStaticBenchmark.staticCHI thrpt 10 13780.341 ? 1839.651 ops/us > > > Performance for stable lists (thread safe) in both instance and static contexts whereby we access a single value compared to `ArrayList` instances (which are not thread-safe) (all threads): > > > Benchmark Mode Cnt Score Error Units > StableListElementBenchmark.instanceArrayList thrpt 10 5812.992 ? 1169.730 ops/us > StableListElementBenchmark.instanceList thrpt 10 4818.643 ? 704.893 ops/us > StableListElementBenchmark.staticArrayList thrpt 10 7614.741 ? 564.777 ops/us > StableListElementBe... Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Improve toString and simplify offset calculations ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18794/files - new: https://git.openjdk.org/jdk/pull/18794/files/dbbefea5..3e1ab5e9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=12-13 Stats: 51 lines in 2 files changed: 18 ins; 27 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/18794.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18794/head:pull/18794 PR: https://git.openjdk.org/jdk/pull/18794 From pminborg at openjdk.org Thu May 16 12:34:46 2024 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 16 May 2024 12:34:46 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v15] In-Reply-To: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: > # Stable Values & Collections (Internal) > > ## Summary > This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. > > ## Goals > * Provide an easy and intuitive API to describe value holders that can change at most once. > * Decouple declaration from initialization without significant footprint or performance penalties. > * Reduce the amount of static initializer and/or field initialization code. > * Uphold integrity and consistency, even in a multi-threaded environment. > > For more details, see the draft JEP: https://openjdk.org/jeps/8312611 > > ## Performance > Performance compared to instance variables using two `AtomicReference` and two protected by double-checked locking under concurrent access by all threads: > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.atomic thrpt 10 259.478 ? 36.809 ops/us > StableBenchmark.dcl thrpt 10 225.710 ? 26.638 ops/us > StableBenchmark.stable thrpt 10 4382.478 ? 1151.472 ops/us <- StableValue significantly faster > > > Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (all threads): > > > Benchmark Mode Cnt Score Error Units > StableStaticBenchmark.atomic thrpt 10 6487.835 ? 385.639 ops/us > StableStaticBenchmark.dcl thrpt 10 6605.239 ? 210.610 ops/us > StableStaticBenchmark.stable thrpt 10 14338.239 ? 1426.874 ops/us > StableStaticBenchmark.staticCHI thrpt 10 13780.341 ? 1839.651 ops/us > > > Performance for stable lists (thread safe) in both instance and static contexts whereby we access a single value compared to `ArrayList` instances (which are not thread-safe) (all threads): > > > Benchmark Mode Cnt Score Error Units > StableListElementBenchmark.instanceArrayList thrpt 10 5812.992 ? 1169.730 ops/us > StableListElementBenchmark.instanceList thrpt 10 4818.643 ? 704.893 ops/us > StableListElementBenchmark.staticArrayList thrpt 10 7614.741 ? 564.777 ops/us > StableListElementBe... Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Simplify background thread handling ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18794/files - new: https://git.openjdk.org/jdk/pull/18794/files/3e1ab5e9..2af0168e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=13-14 Stats: 7 lines in 1 file changed: 0 ins; 4 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18794.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18794/head:pull/18794 PR: https://git.openjdk.org/jdk/pull/18794 From amitkumar at openjdk.org Thu May 16 12:40:02 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 16 May 2024 12:40:02 GMT Subject: RFR: 8331935: Add support for primitive array C1 clone intrinsic in PPC In-Reply-To: References: Message-ID: On Wed, 15 May 2024 13:50:27 GMT, Varada M wrote: > https://bugs.openjdk.org/browse/JDK-8302850 port for PPC64 > > JMH Benchmark Results > > > Before : > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 114.107 ? 1.337 ns/op > ArrayClone.byteArraycopy 10 avgt 15 130.492 ? 0.991 ns/op > ArrayClone.byteArraycopy 100 avgt 15 139.103 ? 1.913 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 321.688 ? 6.033 ns/op > ArrayClone.byteClone 0 avgt 15 227.602 ? 3.393 ns/op > ArrayClone.byteClone 10 avgt 15 237.624 ? 2.996 ns/op > ArrayClone.byteClone 100 avgt 15 239.219 ? 2.835 ns/op > > ArrayClone.byteClone 1000 avgt 15 355.571 ? 2.946 ns/op > ArrayClone.intArraycopy 0 avgt 15 113.275 ? 1.099 ns/op > ArrayClone.intArraycopy 10 avgt 15 129.763 ? 1.458 ns/op > ArrayClone.intArraycopy 100 avgt 15 213.327 ? 2.524 ns/op > ArrayClone.intArraycopy 1000 avgt 15 449.650 ? 7.338 ns/op > ArrayClone.intClone 0 avgt 15 225.682 ? 3.048 ns/op > ArrayClone.intClone 10 avgt 15 234.532 ? 2.817 ns/op > ArrayClone.intClone 100 avgt 15 295.934 ? 4.925 ns/op > ArrayClone.intClone 1000 avgt 15 573.368 ? 5.739 ns/op > Finished running test 'micro:java.lang.ArrayClone' > Test report is stored in build/aix-ppc64-server-release/test-results/micro_java_lang_ArrayClone > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > micro:java.lang.ArrayClone 1 1 0 0 > ============================== > TEST SUCCESS > > Finished building target 'test' in configuration 'aix-ppc64-server-release' > > > > > After: > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 113.894 ? 0.993 ns/op > ArrayClone.byteArraycopy 10 avgt 15 131.455 ? 0.956 ns/op > ArrayClone.byteArraycopy 100 avgt 15 139.145 ? 3.002 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 315.957 ? 14.591 ns/op > ArrayClone.byteClone 0 avgt 15 43.753 ? 3.669 ns/op > ArrayClone.byteClone 10 avgt 15 52.329 ? 1.041 ns/op > ArrayClone.byteClone 100 avgt 15 127.711 ? 3.938 ns/op > > ArrayClone.byteClone 1000 avgt 15 225.937 ? 1.987 ns/op > ArrayClone.intArraycopy 0 avgt 15 113.788 ? 0.770 ns/op > ArrayClone.intArraycopy 10 avgt 1... Good ------------- Marked as reviewed by amitkumar (Committer). PR Review: https://git.openjdk.org/jdk/pull/19250#pullrequestreview-2060621901 From pminborg at openjdk.org Thu May 16 12:48:24 2024 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 16 May 2024 12:48:24 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v16] In-Reply-To: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: > # Stable Values & Collections (Internal) > > ## Summary > This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. > > ## Goals > * Provide an easy and intuitive API to describe value holders that can change at most once. > * Decouple declaration from initialization without significant footprint or performance penalties. > * Reduce the amount of static initializer and/or field initialization code. > * Uphold integrity and consistency, even in a multi-threaded environment. > > For more details, see the draft JEP: https://openjdk.org/jeps/8312611 > > ## Performance > Performance compared to instance variables using two `AtomicReference` and two protected by double-checked locking under concurrent access by all threads: > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.atomic thrpt 10 259.478 ? 36.809 ops/us > StableBenchmark.dcl thrpt 10 225.710 ? 26.638 ops/us > StableBenchmark.stable thrpt 10 4382.478 ? 1151.472 ops/us <- StableValue significantly faster > > > Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (all threads): > > > Benchmark Mode Cnt Score Error Units > StableStaticBenchmark.atomic thrpt 10 6487.835 ? 385.639 ops/us > StableStaticBenchmark.dcl thrpt 10 6605.239 ? 210.610 ops/us > StableStaticBenchmark.stable thrpt 10 14338.239 ? 1426.874 ops/us > StableStaticBenchmark.staticCHI thrpt 10 13780.341 ? 1839.651 ops/us > > > Performance for stable lists (thread safe) in both instance and static contexts whereby we access a single value compared to `ArrayList` instances (which are not thread-safe) (all threads): > > > Benchmark Mode Cnt Score Error Units > StableListElementBenchmark.instanceArrayList thrpt 10 5812.992 ? 1169.730 ops/us > StableListElementBenchmark.instanceList thrpt 10 4818.643 ? 704.893 ops/us > StableListElementBenchmark.staticArrayList thrpt 10 7614.741 ? 564.777 ops/us > StableListElementBe... Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Fix copyringht issues ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18794/files - new: https://git.openjdk.org/jdk/pull/18794/files/2af0168e..ec7c92cd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=14-15 Stats: 206 lines in 13 files changed: 200 ins; 1 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/18794.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18794/head:pull/18794 PR: https://git.openjdk.org/jdk/pull/18794 From luhenry at openjdk.org Thu May 16 12:49:03 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 16 May 2024 12:49:03 GMT Subject: RFR: 8332153: RISC-V: enable tests and add comment for vector shift instruct (shared by vectorization and Vector API) In-Reply-To: References: Message-ID: On Thu, 16 May 2024 11:12:09 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > For vector shift instruct, some corresponding tests are not enabled, this is to enable them. > And the way how vector shift instruct works is not clear, especially both vectorization (SLP in jdk) and Vector API share the same instruct's in riscv_v.ad, so also added some comment to clarify it. > > Thanks src/hotspot/cpu/riscv/riscv_v.ad line 1802: > 1800: // and https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#116-vector-single-width-shift-instructions for details. > 1801: // > 1802: // Although the difference between these 2 behaviours, the same shift instruct's of byte and short are Suggestion: // Despite the difference between these 2 behaviours, the same shift instruct's of byte and short are ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19265#discussion_r1603196839 From duke at openjdk.org Thu May 16 13:03:24 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Thu, 16 May 2024 13:03:24 GMT Subject: RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v7] In-Reply-To: References: Message-ID: > Hello everyone! Please review this ~non-vectorized~ implementation of `_updateBytesAdler32` intrinsic. Reference implementation for AArch64 can be found [here](https://github.com/openjdk/jdk9/blob/master/hotspot/src/cpu/aarch64/vm/stubGenerator_aarch64.cpp#L3281). > > ### Correctness checks > > Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok. All tier1 also passed. > > ### Performance results on T-Head board > > Enabled intrinsic: > > | Benchmark | (count) | Mode | Cnt | Score | Error | Units | > | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- | > | Adler32.TestAdler32.testAdler32Update | 64 | thrpt | 25 | 5522.693 | 23.387 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 128 | thrpt | 25 | 3430.761 | 9.210 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 256 | thrpt | 25 | 1962.888 | 5.323 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 512 | thrpt | 25 | 1050.938 | 0.144 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 1024 | thrpt | 25 | 549.227 | 0.375 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 2048 | thrpt | 25 | 280.829 | 0.170 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 5012 | thrpt | 25 | 116.333 | 0.057 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 8192 | thrpt | 25 | 71.392 | 0.060 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 16384 | thrpt | 25 | 35.784 | 0.019 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 32768 | thrpt | 25 | 17.924 | 0.010 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 65536 | thrpt | 25 | 8.940 | 0.003 | ops/ms | > > Disabled intrinsic: > > | Benchmark | (count) | Mode | Cnt | Score | Error | Units | > | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- | > |Adler32.TestAdler32.testAdler32Update|64|thrpt|25|655.633|5.845|ops/ms| > |Adler32.TestAdler32.testAdler32Update|128|thrpt|25|587.418|10.062|ops/ms| > |Adler32.TestAdler32.testAdler32Update|256|thrpt|25|546.675|11.598|ops/ms| > |Adler32.TestAdler32.testAdler32Update|512|thrpt|25|432.328|11.517|ops/ms| > |Adler32.TestAdler32.testAdler32Update|1024|thrpt|25|311.771|4.238|ops/ms| > |Adler32.TestAdler32.testAdler32Update|2048|thrpt|25|202.648|2.486|ops/ms| > |Adler32.TestAdler32.testAdler32Update|5012|thrpt|25|100.246|1.119|ops/ms| > |Adler32.TestAdler32.testAdler32Update|8192|t... ArsenyBochkarev has updated the pull request incrementally with eight additional commits since the last revision: - Prettify L_nmax loop - Add comments in functions - Add explanation comment for L_nmax_loop - Fix L_nmax_loop for big lengths - Fix L_by16 loop step - Prettify intrinsic - Use LMUL=4 for most of the calculations - Use LMUL to load multiple data in one step ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18382/files - new: https://git.openjdk.org/jdk/pull/18382/files/3cf649c9..be7d2551 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18382&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18382&range=05-06 Stats: 124 lines in 1 file changed: 56 ins; 15 del; 53 mod Patch: https://git.openjdk.org/jdk/pull/18382.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18382/head:pull/18382 PR: https://git.openjdk.org/jdk/pull/18382 From duke at openjdk.org Thu May 16 13:03:24 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Thu, 16 May 2024 13:03:24 GMT Subject: RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v6] In-Reply-To: References: Message-ID: On Thu, 18 Apr 2024 08:39:35 GMT, ArsenyBochkarev wrote: >> Hello everyone! Please review this ~non-vectorized~ implementation of `_updateBytesAdler32` intrinsic. Reference implementation for AArch64 can be found [here](https://github.com/openjdk/jdk9/blob/master/hotspot/src/cpu/aarch64/vm/stubGenerator_aarch64.cpp#L3281). >> >> ### Correctness checks >> >> Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok. All tier1 also passed. >> >> ### Performance results on T-Head board >> >> Enabled intrinsic: >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- | >> | Adler32.TestAdler32.testAdler32Update | 64 | thrpt | 25 | 5522.693 | 23.387 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 128 | thrpt | 25 | 3430.761 | 9.210 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 256 | thrpt | 25 | 1962.888 | 5.323 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 512 | thrpt | 25 | 1050.938 | 0.144 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 1024 | thrpt | 25 | 549.227 | 0.375 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 2048 | thrpt | 25 | 280.829 | 0.170 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 5012 | thrpt | 25 | 116.333 | 0.057 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 8192 | thrpt | 25 | 71.392 | 0.060 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 16384 | thrpt | 25 | 35.784 | 0.019 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 32768 | thrpt | 25 | 17.924 | 0.010 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 65536 | thrpt | 25 | 8.940 | 0.003 | ops/ms | >> >> Disabled intrinsic: >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- | >> |Adler32.TestAdler32.testAdler32Update|64|thrpt|25|655.633|5.845|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|128|thrpt|25|587.418|10.062|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|256|thrpt|25|546.675|11.598|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|512|thrpt|25|432.328|11.517|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|1024|thrpt|25|311.771|4.238|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|2048|thrpt|25|202.648|2.486|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|5012|thrpt|... > > ArsenyBochkarev has updated the pull request incrementally with 12 additional commits since the last revision: > > - Use mv instead of li > - Prettify function > - Remove unnecessary zeroing of vtemp1, vtemp2 > - Remove unnecessary zeroing of v4, ..., v27 > - Remove unnecessary assert > - Move similar unroll code to a function > - Fix comment > - Dispose of unnecessary arguments in accum function > - Accelerate vectorization > - Use two vredsum instead of vadd + vwredsum > - Make use of more vector registers > - Dispose of most of vsetivli instructions > - Prettify loop remainder > - ... and 2 more: https://git.openjdk.org/jdk/compare/8a74349c...3cf649c9 Updated results for enabled intrinsic on Kendryte K230: | Benchmark | (count) | Mode | Cnt | Score | Error | Units | | --------------------------------- | -------------- | --------- | ----- | ------ | ------------ | -------- | | Adler32.TestAdler32.testAdler32Update | 64 | thrpt | 25 | 7244.611 | 52.963 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 128 | thrpt | 25 | 4679.629 | 34.326 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 256 | thrpt | 25 | 2740.242 | 15.299 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 512 | thrpt | 25 | 1509.818 | 0.856 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 1024 | thrpt | 25 | 791.004 | 1.774 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 2048 | thrpt | 25 | 406.103 | 0.582 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 5012 | thrpt | 25 | 167.894 | 0.374 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 8192 | thrpt | 25 | 171.731 | 0.187 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 16384 | thrpt | 25 | 86.127 | 0.084 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 32768 | thrpt | 25 | 48.468 | 0.075 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 65536 | thrpt | 25 | 23.818 | 0.516 | ops/ms | Results for disabled intrinsic are [here](https://github.com/openjdk/jdk/pull/18382#issuecomment-2045145255) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18382#issuecomment-2115185519 From duke at openjdk.org Thu May 16 13:03:24 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Thu, 16 May 2024 13:03:24 GMT Subject: RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v6] In-Reply-To: <3juhaO3iNbSdakSMDzcjgpARY7O4XMCe1pZMsxBFsis=.d8747866-c563-4fcd-af1d-62383857cbd8@github.com> References: <3juhaO3iNbSdakSMDzcjgpARY7O4XMCe1pZMsxBFsis=.d8747866-c563-4fcd-af1d-62383857cbd8@github.com> Message-ID: On Tue, 23 Apr 2024 07:32:08 GMT, Fei Yang wrote: >> ArsenyBochkarev has updated the pull request incrementally with 12 additional commits since the last revision: >> >> - Use mv instead of li >> - Prettify function >> - Remove unnecessary zeroing of vtemp1, vtemp2 >> - Remove unnecessary zeroing of v4, ..., v27 >> - Remove unnecessary assert >> - Move similar unroll code to a function >> - Fix comment >> - Dispose of unnecessary arguments in accum function >> - Accelerate vectorization >> - Use two vredsum instead of vadd + vwredsum >> - Make use of more vector registers >> - Dispose of most of vsetivli instructions >> - Prettify loop remainder >> - ... and 2 more: https://git.openjdk.org/jdk/compare/8a74349c...3cf649c9 > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5090: > >> 5088: >> 5089: __ vsetivli(temp0, 16, Assembler::e8, Assembler::m1); >> 5090: for (int i = 0; i < unroll_factor; i++) > > Does it make sense to limit the vector lenth to 16 bytes and do loop unrolling here? I think the aarch64 version of `generate_updateBytesAdler32_accum` has this constraint because they use NEON which only has 128-bit vector registers. But for RVV, we can combine several vector registers into register group (LMUL greater than 1). Hi! Thanks for pointing it out! Sorry for such a late reply. I made some changes with vector register grouping, using LMUL = 4 mode, as this size is maximum possible with current calculating algorithm. I listed updated results below. Can you please take another look? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1603299099 From amitkumar at openjdk.org Thu May 16 13:26:07 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 16 May 2024 13:26:07 GMT Subject: RFR: 8331934: [s390x] Add support for primitive array C1 clone intrinsic [v3] In-Reply-To: References: Message-ID: <_d_t7AswyPStm_TD86ij3hEIZZpbpjgEQdQFHuxvu3s=.9e1b4858-32f8-4659-9019-017d1a93e0e0@github.com> On Wed, 15 May 2024 09:25:32 GMT, Amit Kumar wrote: >> Adds JDK-8302850 Port for s390x. >> >> Testing: >> >> make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:hotspot_compiler 1166 1166 0 0 >> ============================== >> TEST SUCCESS >> >> * Tier1 Test with Fast debug build. >> >> BenchMarking: >> >> >> Without Patch: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 10.838 ? 0.461 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 28.919 ? 1.695 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 48.815 ? 0.901 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 256.357 ? 7.901 ns/op >> ArrayClone.byteClone 0 avgt 15 90.398 ? 3.119 ns/op >> ArrayClone.byteClone 10 avgt 15 103.774 ? 4.468 ns/op >> ArrayClone.byteClone 100 avgt 15 126.628 ? 6.952 ns/op >> ArrayClone.byteClone 1000 avgt 15 326.409 ? 31.635 ns/op >> ArrayClone.intArraycopy 0 avgt 15 10.450 ? 0.509 ns/op >> ArrayClone.intArraycopy 10 avgt 15 36.903 ? 0.753 ns/op >> ArrayClone.intArraycopy 100 avgt 15 85.964 ? 1.806 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 841.512 ? 40.335 ns/op >> ArrayClone.intClone 0 avgt 15 89.332 ? 3.695 ns/op >> ArrayClone.intClone 10 avgt 15 110.639 ? 2.476 ns/op >> ArrayClone.intClone 100 avgt 15 195.781 ? 8.622 ns/op >> ArrayClone.intClone 1000 avgt 15 1058.479 ? 92.468 ns/op >> Finished running test 'micro:java.lang.ArrayClone' >> >> >> with patch: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 10.526... > > Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: > > - Merge master > - s390x Port > - Update src/hotspot/share/c1/c1_GraphBuilder.cpp > > Co-authored-by: Dean Long <17332032+dean-long at users.noreply.github.com> > - Fix assert to only have a single ! > - Assert type is not interface > - Remove whitespace > - Expanded testing in TestNullArrayClone > > * Added byte[] and long[] tests. > * Verified that the cloned array has the same contents. > * Increase number of iterations reach tier 3 threshold. > - Update src/hotspot/share/c1/c1_GraphBuilder.cpp > > Co-authored-by: Boris <42576543+bulasevich at users.noreply.github.com> > - Added test summary > - Use vmIntrinsics instead of vmIntrinsicID > - ... and 16 more: https://git.openjdk.org/jdk/compare/2f10a316...865de5ba Result from the LPAR: without patch: Benchmark (size) Mode Cnt Score Error Units ArrayClone.byteArraycopy 0 avgt 15 4.021 ? 0.032 ns/op ArrayClone.byteArraycopy 10 avgt 15 15.931 ? 0.171 ns/op ArrayClone.byteArraycopy 100 avgt 15 16.201 ? 0.076 ns/op ArrayClone.byteArraycopy 1000 avgt 15 45.268 ? 0.213 ns/op ArrayClone.byteClone 0 avgt 15 70.300 ? 0.244 ns/op ArrayClone.byteClone 10 avgt 15 77.112 ? 0.558 ns/op ArrayClone.byteClone 100 avgt 15 79.860 ? 0.606 ns/op ArrayClone.byteClone 1000 avgt 15 112.834 ? 0.526 ns/op ArrayClone.intArraycopy 0 avgt 15 4.007 ? 0.012 ns/op ArrayClone.intArraycopy 10 avgt 15 15.378 ? 0.055 ns/op ArrayClone.intArraycopy 100 avgt 15 25.387 ? 0.102 ns/op ArrayClone.intArraycopy 1000 avgt 15 161.278 ? 0.719 ns/op ArrayClone.intClone 0 avgt 15 70.341 ? 0.265 ns/op ArrayClone.intClone 10 avgt 15 78.209 ? 0.514 ns/op ArrayClone.intClone 100 avgt 15 89.845 ? 0.571 ns/op ArrayClone.intClone 1000 avgt 15 257.037 ? 2.809 ns/op Finished running test 'micro:java.lang.ArrayClone' with patch: Benchmark (size) Mode Cnt Score Error Units ArrayClone.byteArraycopy 0 avgt 15 4.021 ? 0.027 ns/op ArrayClone.byteArraycopy 10 avgt 15 16.106 ? 0.859 ns/op ArrayClone.byteArraycopy 100 avgt 15 16.212 ? 0.045 ns/op ArrayClone.byteArraycopy 1000 avgt 15 45.147 ? 0.137 ns/op ArrayClone.byteClone 0 avgt 15 3.570 ? 0.010 ns/op ArrayClone.byteClone 10 avgt 15 6.033 ? 0.018 ns/op ArrayClone.byteClone 100 avgt 15 6.868 ? 0.020 ns/op ArrayClone.byteClone 1000 avgt 15 33.437 ? 0.114 ns/op ArrayClone.intArraycopy 0 avgt 15 4.008 ? 0.010 ns/op ArrayClone.intArraycopy 10 avgt 15 15.373 ? 0.044 ns/op ArrayClone.intArraycopy 100 avgt 15 29.543 ? 3.687 ns/op ArrayClone.intArraycopy 1000 avgt 15 161.554 ? 0.414 ns/op ArrayClone.intClone 0 avgt 15 3.571 ? 0.010 ns/op ArrayClone.intClone 10 avgt 15 6.184 ? 0.016 ns/op ArrayClone.intClone 100 avgt 15 13.304 ? 0.043 ns/op ArrayClone.intClone 1000 avgt 15 133.755 ? 0.362 ns/op Finished running test 'micro:java.lang.ArrayClone' ------------- PR Comment: https://git.openjdk.org/jdk/pull/19220#issuecomment-2115237059 From mli at openjdk.org Thu May 16 14:04:09 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 16 May 2024 14:04:09 GMT Subject: RFR: 8332394: Add friendly output when @IR rule missing value Message-ID: Hi, Can you help to review this simple patch? Currently, when a @IR rule like "applyIfPlatform" or "applyIfCPUFeature" miss a value, it will just throw ArrayIndexOutOfBoundsException, with no other information. This is confusing unless you dig into the test frame code. It's helpful to output more meaningful information. Thanks ------------- Commit messages: - Initial commit Changes: https://git.openjdk.org/jdk/pull/19270/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19270&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332394 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19270.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19270/head:pull/19270 PR: https://git.openjdk.org/jdk/pull/19270 From pminborg at openjdk.org Thu May 16 14:07:09 2024 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 16 May 2024 14:07:09 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v16] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: <2EAy6RJqZenabnmy5aywucjDykt5ZK6IvDxlnXwl98E=.98c8bf93-f11e-40cc-931f-7113f0e9cfda@github.com> On Thu, 16 May 2024 12:48:24 GMT, Per Minborg wrote: >> # Stable Values & Collections (Internal) >> >> ## Summary >> This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. >> >> ## Goals >> * Provide an easy and intuitive API to describe value holders that can change at most once. >> * Decouple declaration from initialization without significant footprint or performance penalties. >> * Reduce the amount of static initializer and/or field initialization code. >> * Uphold integrity and consistency, even in a multi-threaded environment. >> >> For more details, see the draft JEP: https://openjdk.org/jeps/8312611 >> >> ## Performance >> Performance compared to instance variables using two `AtomicReference` and two protected by double-checked locking under concurrent access by all threads: >> >> >> Benchmark Mode Cnt Score Error Units >> StableBenchmark.atomic thrpt 10 259.478 ? 36.809 ops/us >> StableBenchmark.dcl thrpt 10 225.710 ? 26.638 ops/us >> StableBenchmark.stable thrpt 10 4382.478 ? 1151.472 ops/us <- StableValue significantly faster >> >> >> Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (all threads): >> >> >> Benchmark Mode Cnt Score Error Units >> StableStaticBenchmark.atomic thrpt 10 6487.835 ? 385.639 ops/us >> StableStaticBenchmark.dcl thrpt 10 6605.239 ? 210.610 ops/us >> StableStaticBenchmark.stable thrpt 10 14338.239 ? 1426.874 ops/us >> StableStaticBenchmark.staticCHI thrpt 10 13780.341 ? 1839.651 ops/us >> >> >> Performance for stable lists (thread safe) in both instance and static contexts whereby we access a single value compared to `ArrayList` instances (which are not thread-safe) (all threads): >> >> >> Benchmark Mode Cnt Score Error Units >> StableListElementBenchmark.instanceArrayList thrpt 10 5812.992 ? 1169.730 ops/us >> StableListElementBenchmark.instanceList thrpt 10 4818.643 ? 704.893 ops/us >> StableListElementBenchmark... > > Per Minborg has updated the pull request incrementally with one additional commit since the last revision: > > Fix copyringht issues Here are some updated benchmark graphs where we sum two instance variables of different sorts (higher is better): ![image](https://github.com/openjdk/jdk/assets/7457876/d82561d6-e803-4345-b6d2-6b0402e60211) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18794#issuecomment-2115343828 From mdoerr at openjdk.org Thu May 16 14:28:02 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 16 May 2024 14:28:02 GMT Subject: RFR: 8331935: Add support for primitive array C1 clone intrinsic in PPC In-Reply-To: References: Message-ID: On Wed, 15 May 2024 13:50:27 GMT, Varada M wrote: > https://bugs.openjdk.org/browse/JDK-8302850 port for PPC64 > > JMH Benchmark Results > > > Before : > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 114.107 ? 1.337 ns/op > ArrayClone.byteArraycopy 10 avgt 15 130.492 ? 0.991 ns/op > ArrayClone.byteArraycopy 100 avgt 15 139.103 ? 1.913 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 321.688 ? 6.033 ns/op > ArrayClone.byteClone 0 avgt 15 227.602 ? 3.393 ns/op > ArrayClone.byteClone 10 avgt 15 237.624 ? 2.996 ns/op > ArrayClone.byteClone 100 avgt 15 239.219 ? 2.835 ns/op > > ArrayClone.byteClone 1000 avgt 15 355.571 ? 2.946 ns/op > ArrayClone.intArraycopy 0 avgt 15 113.275 ? 1.099 ns/op > ArrayClone.intArraycopy 10 avgt 15 129.763 ? 1.458 ns/op > ArrayClone.intArraycopy 100 avgt 15 213.327 ? 2.524 ns/op > ArrayClone.intArraycopy 1000 avgt 15 449.650 ? 7.338 ns/op > ArrayClone.intClone 0 avgt 15 225.682 ? 3.048 ns/op > ArrayClone.intClone 10 avgt 15 234.532 ? 2.817 ns/op > ArrayClone.intClone 100 avgt 15 295.934 ? 4.925 ns/op > ArrayClone.intClone 1000 avgt 15 573.368 ? 5.739 ns/op > Finished running test 'micro:java.lang.ArrayClone' > Test report is stored in build/aix-ppc64-server-release/test-results/micro_java_lang_ArrayClone > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > micro:java.lang.ArrayClone 1 1 0 0 > ============================== > TEST SUCCESS > > Finished building target 'test' in configuration 'aix-ppc64-server-release' > > > > > After: > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 113.894 ? 0.993 ns/op > ArrayClone.byteArraycopy 10 avgt 15 131.455 ? 0.956 ns/op > ArrayClone.byteArraycopy 100 avgt 15 139.145 ? 3.002 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 315.957 ? 14.591 ns/op > ArrayClone.byteClone 0 avgt 15 43.753 ? 3.669 ns/op > ArrayClone.byteClone 10 avgt 15 52.329 ? 1.041 ns/op > ArrayClone.byteClone 100 avgt 15 127.711 ? 3.938 ns/op > > ArrayClone.byteClone 1000 avgt 15 225.937 ? 1.987 ns/op > ArrayClone.intArraycopy 0 avgt 15 113.788 ? 0.770 ns/op > ArrayClone.intArraycopy 10 avgt 1... I got test failures on AIX which need investigation: compiler/c2/Test6910605_2.java assert(oopDesc::is_oop(s)) failed: JVM_ArrayCopy: src not an oop ------------- Changes requested by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19250#pullrequestreview-2060968234 From chagedorn at openjdk.org Thu May 16 14:39:02 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 16 May 2024 14:39:02 GMT Subject: RFR: 8332394: Add friendly output when @IR rule missing value In-Reply-To: References: Message-ID: On Thu, 16 May 2024 13:59:16 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Currently, when a @IR rule like "applyIfPlatform" or "applyIfCPUFeature" miss a value, it will just throw ArrayIndexOutOfBoundsException, with no other information. This is confusing unless you dig into the test frame code. > It's helpful to output more meaningful information. > Thanks Good catch! Only a small improvement suggestion, otherwise, looks good. Just noticed that we are actually missing tests that trigger a format violation in `TestBadFormat` for `applyIfCPUFeature*` and `applyIfPlatform*`. We should probably add some at some point, analogously to the ones already there for `applyIf*` for flags. But that could be done separately. test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java line 293: > 291: String platform = andRules[i].trim(); > 292: i++; > 293: TestFormat.check(i < andRules.length, "Missing value for platform " + platform + failAt()); I suggest to also add the `ruleType` as in `hasAllRequiredFlags()`, for example. Then it is even more precise. For even more readability you could add some `""`: Current: Missing value for platform xyz in @IR rule 1 at foo() vs. Improved: Missing value for platform "xyz" in @IR rule 1 in "applyIfPlatform" at foo() ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19270#pullrequestreview-2060974799 PR Review Comment: https://git.openjdk.org/jdk/pull/19270#discussion_r1603482668 From mli at openjdk.org Thu May 16 14:45:17 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 16 May 2024 14:45:17 GMT Subject: RFR: 8332153: RISC-V: enable tests and add comment for vector shift instruct (shared by vectorization and Vector API) [v2] In-Reply-To: References: Message-ID: On Thu, 16 May 2024 11:48:15 GMT, Ludovic Henry wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> fix words > > src/hotspot/cpu/riscv/riscv_v.ad line 1802: > >> 1800: // and https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#116-vector-single-width-shift-instructions for details. >> 1801: // >> 1802: // Although the difference between these 2 behaviours, the same shift instruct's of byte and short are > > Suggestion: > > // Despite the difference between these 2 behaviours, the same shift instruct's of byte and short are Thanks, fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19265#discussion_r1603509532 From mli at openjdk.org Thu May 16 14:45:16 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 16 May 2024 14:45:16 GMT Subject: RFR: 8332153: RISC-V: enable tests and add comment for vector shift instruct (shared by vectorization and Vector API) [v2] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > For vector shift instruct, some corresponding tests are not enabled, this is to enable them. > And the way how vector shift instruct works is not clear, especially both vectorization (SLP in jdk) and Vector API share the same instruct's in riscv_v.ad, so also added some comment to clarify it. > > Thanks Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: fix words ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19265/files - new: https://git.openjdk.org/jdk/pull/19265/files/0078c854..809f92e9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19265&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19265&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19265.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19265/head:pull/19265 PR: https://git.openjdk.org/jdk/pull/19265 From thartmann at openjdk.org Thu May 16 14:47:12 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 16 May 2024 14:47:12 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v5] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 12:07:22 GMT, Roland Westrelin wrote: >> Range check `CastII` nodes are removed once loop opts are over. The >> test case for this change includes 3 cases where elimination of a >> range check `CastII` causes a crash in compiled code because either a >> out of bounds array load or a division by zero happen. >> >> In `test1`: >> >> - the range checks for the `array[otherArray.length]` loads constant >> fold: `otherArray.length` is a `CastII` of i at the `otherArray` >> allocation. `i` is less than 9. The `CastII` at the allocation >> narrows the type down further to `[0-9]`. >> >> - the `array[otherArray.length]` loads are control dependent on the >> unrelated: >> >> >> if (flag == 0) { >> >> >> test. There's an identical dominating test which replaces that one. As >> a consequence, the `array[otherArray.length]` loads become control >> dependent on the dominating test. >> >> - The `CastII` nodes at the `otherArray` allocations are replaced by a >> dominating range check `CastII` nodes for: >> >> >> newArray[i] = 42; >> >> >> - After loop opts, the range check `CastII` nodes are removed and the >> 2 `array[otherArray.length]` loads common at the first: >> >> >> if (flag == 0) { >> >> >> test before the: >> >> >> float[] otherArray = new float[i]; >> >> >> and >> >> >> newArray[i] = 42; >> >> >> that guarantee `i` is positive. >> >> - `test1` is called with `i = -1`, the array load proceeds with an out >> of bounds index and the crash occurs. >> >> >> `test2` and `test3` are mostly identical except for the check that's >> eliminated (a null divisor check) and the instruction that causes a >> fault (an integer division). >> >> The fix I propose is to not eliminate range check `CastII` nodes after >> loop opts. When range check`CastII` nodes were introduced, performance >> was observed to regress. Removing them after loop opts was found to >> preserve both correctness and performance. Today, the performance >> regression still exists when `CastII` nodes are left in. So I propose >> we keep them until the end of optimizations (so the 2 array loads >> above don't lose a dependency and wrongly common) but remove them at >> the end of all optimizations. >> >> In the case of the array loads, they are dependent on a range check >> for another array through a range check `CastII` and we must not lose >> that dependency otherwise the array loads could float above the range >> check at gcm time. I propose we deal with that problem the way it's >> handled for `CastPP` nodes: add the dependency to the load (or >> division)nodes ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: > > - Node::is_div_or_mod() > - Merge branch 'master' into JDK-8324517 > - test fix > - review > - Merge branch 'master' into JDK-8324517 > - Merge branch 'master' into JDK-8324517 > - review > - Merge branch 'master' into JDK-8324517 > - test and fix Right, I did run testing on an early draft and v01 and only saw the `Error: VM option 'StressIGVN' is diagnostic and must be enabled via -XX:+UnlockDiagnosticVMOptions` issue I reported above. We missed re-running testing of later versions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18377#issuecomment-2115447227 From thartmann at openjdk.org Thu May 16 14:52:01 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 16 May 2024 14:52:01 GMT Subject: RFR: 8331885: C2: meet between unloaded and speculative types is not symmetric In-Reply-To: References: Message-ID: On Wed, 15 May 2024 13:30:46 GMT, Vladimir Ivanov wrote: > `TypeInstPtr::xmeet_unloaded` computes the MEET of two InstPtrs when at least one is unloaded, but doesn't preserve speculative part if one is present. It causes the corresponding assert to fail. > > Proposed fix unconditionally keeps speculative part. > > Testing: hs-tier1 - hs-tier4 Looks good to me too. Please also run `hs-comp-stress` and `hs-precheckin-comp` testing. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19249#pullrequestreview-2061044711 From duke at openjdk.org Thu May 16 15:25:36 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Thu, 16 May 2024 15:25:36 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v21] In-Reply-To: References: Message-ID: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: fix typo in two asserts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/156bbfc5..04a7db2a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=19-20 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From varadam at openjdk.org Thu May 16 15:27:03 2024 From: varadam at openjdk.org (Varada M) Date: Thu, 16 May 2024 15:27:03 GMT Subject: RFR: 8331935: Add support for primitive array C1 clone intrinsic in PPC In-Reply-To: References: Message-ID: On Thu, 16 May 2024 14:24:56 GMT, Martin Doerr wrote: > I got test failures on AIX which need investigation: compiler/c2/Test6910605_2.java assert(oopDesc::is_oop(s)) failed: JVM_ArrayCopy: src not an oop Hi @TheRealMDoerr , this test failure was not showing for me. I retested with 'JAVA_OPTIONS=-XX:TieredStopAtLevel=1' and the test is passing ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR jtreg:test/hotspot/jtreg/compiler/c2/Test6910605_2.java 1 1 0 0 ============================== TEST SUCCESS ------------- PR Comment: https://git.openjdk.org/jdk/pull/19250#issuecomment-2115543767 From mli at openjdk.org Thu May 16 16:09:28 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 16 May 2024 16:09:28 GMT Subject: RFR: 8332394: Add friendly output when @IR rule missing value [v2] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this simple patch? > Currently, when a @IR rule like "applyIfPlatform" or "applyIfCPUFeature" miss a value, it will just throw ArrayIndexOutOfBoundsException, with no other information. This is confusing unless you dig into the test frame code. > It's helpful to output more meaningful information. > Thanks Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: add more information in output ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19270/files - new: https://git.openjdk.org/jdk/pull/19270/files/529257fd..7945f8fb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19270&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19270&range=00-01 Stats: 14 lines in 1 file changed: 0 ins; 0 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/19270.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19270/head:pull/19270 PR: https://git.openjdk.org/jdk/pull/19270 From mli at openjdk.org Thu May 16 16:09:28 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 16 May 2024 16:09:28 GMT Subject: RFR: 8332394: Add friendly output when @IR rule missing value [v2] In-Reply-To: References: Message-ID: On Thu, 16 May 2024 14:26:24 GMT, Christian Hagedorn wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> add more information in output > > test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java line 293: > >> 291: String platform = andRules[i].trim(); >> 292: i++; >> 293: TestFormat.check(i < andRules.length, "Missing value for platform " + platform + failAt()); > > I suggest to also add the `ruleType` as in `hasAllRequiredFlags()`, for example. Then it is even more precise. For even more readability you could add some `""`: > Current: > > Missing value for platform xyz in @IR rule 1 at foo() > > vs. > Improved: > > Missing value for platform "xyz" in @IR rule 1 in "applyIfPlatform" at foo() Thanks, it makes sense. I also created https://bugs.openjdk.org/browse/JDK-8332402 to track adding tests in `TestBadFormat` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19270#discussion_r1603656946 From mdoerr at openjdk.org Thu May 16 16:37:03 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 16 May 2024 16:37:03 GMT Subject: RFR: 8331935: Add support for primitive array C1 clone intrinsic in PPC In-Reply-To: References: Message-ID: On Wed, 15 May 2024 13:50:27 GMT, Varada M wrote: > https://bugs.openjdk.org/browse/JDK-8302850 port for PPC64 > > JMH Benchmark Results > > > Before : > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 114.107 ? 1.337 ns/op > ArrayClone.byteArraycopy 10 avgt 15 130.492 ? 0.991 ns/op > ArrayClone.byteArraycopy 100 avgt 15 139.103 ? 1.913 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 321.688 ? 6.033 ns/op > ArrayClone.byteClone 0 avgt 15 227.602 ? 3.393 ns/op > ArrayClone.byteClone 10 avgt 15 237.624 ? 2.996 ns/op > ArrayClone.byteClone 100 avgt 15 239.219 ? 2.835 ns/op > > ArrayClone.byteClone 1000 avgt 15 355.571 ? 2.946 ns/op > ArrayClone.intArraycopy 0 avgt 15 113.275 ? 1.099 ns/op > ArrayClone.intArraycopy 10 avgt 15 129.763 ? 1.458 ns/op > ArrayClone.intArraycopy 100 avgt 15 213.327 ? 2.524 ns/op > ArrayClone.intArraycopy 1000 avgt 15 449.650 ? 7.338 ns/op > ArrayClone.intClone 0 avgt 15 225.682 ? 3.048 ns/op > ArrayClone.intClone 10 avgt 15 234.532 ? 2.817 ns/op > ArrayClone.intClone 100 avgt 15 295.934 ? 4.925 ns/op > ArrayClone.intClone 1000 avgt 15 573.368 ? 5.739 ns/op > Finished running test 'micro:java.lang.ArrayClone' > Test report is stored in build/aix-ppc64-server-release/test-results/micro_java_lang_ArrayClone > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > micro:java.lang.ArrayClone 1 1 0 0 > ============================== > TEST SUCCESS > > Finished building target 'test' in configuration 'aix-ppc64-server-release' > > > > > After: > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 113.894 ? 0.993 ns/op > ArrayClone.byteArraycopy 10 avgt 15 131.455 ? 0.956 ns/op > ArrayClone.byteArraycopy 100 avgt 15 139.145 ? 3.002 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 315.957 ? 14.591 ns/op > ArrayClone.byteClone 0 avgt 15 43.753 ? 3.669 ns/op > ArrayClone.byteClone 10 avgt 15 52.329 ? 1.041 ns/op > ArrayClone.byteClone 100 avgt 15 127.711 ? 3.938 ns/op > > ArrayClone.byteClone 1000 avgt 15 225.937 ? 1.987 ns/op > ArrayClone.intArraycopy 0 avgt 15 113.788 ? 0.770 ns/op > ArrayClone.intArraycopy 10 avgt 1... I can reproduce it on linux with the fastdebug build. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19250#issuecomment-2115698677 From varadam at openjdk.org Thu May 16 17:55:01 2024 From: varadam at openjdk.org (Varada M) Date: Thu, 16 May 2024 17:55:01 GMT Subject: RFR: 8331935: Add support for primitive array C1 clone intrinsic in PPC In-Reply-To: References: Message-ID: On Thu, 16 May 2024 16:34:41 GMT, Martin Doerr wrote: > I can reproduce it on linux with the fastdebug build. Yes. The test failing with fastdebug build ```STDOUT: # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/hotspot/openjdk/jdk-varada/src/hotspot/share/prims/jvm.cpp:301), pid=27263472, tid=4884 # assert(oopDesc::is_oop(s)) failed: JVM_ArrayCopy: src not an oop # # JRE version: OpenJDK Runtime Environment (23.0) (fastdebug build 23-internal-adhoc.hotspot.jdk-varada) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 23-internal-adhoc.hotspot.jdk-varada, mixed mode, emulated-client, tiered, compressed oops, compressed class ptrs, g1 gc, aix-ppc64) # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /home/hotspot/openjdk/jdk-varada/build/aix-ppc64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_c2_Test6910605_2_java/scratch/0/hs_err_pid27263472.log [0.762s][warning][os] Loading hsdis library failed # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp #``` ------------- PR Comment: https://git.openjdk.org/jdk/pull/19250#issuecomment-2115863492 From duke at openjdk.org Thu May 16 19:03:18 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Thu, 16 May 2024 19:03:18 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v22] In-Reply-To: References: Message-ID: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: simplify test in new asserts to just assert UseAPX ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/04a7db2a..47885cbe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=20-21 Stats: 46 lines in 1 file changed: 0 ins; 0 del; 46 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From vkempik at openjdk.org Thu May 16 19:53:05 2024 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 16 May 2024 19:53:05 GMT Subject: RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v7] In-Reply-To: References: Message-ID: On Thu, 16 May 2024 13:03:24 GMT, ArsenyBochkarev wrote: >> Hello everyone! Please review this ~non-vectorized~ implementation of `_updateBytesAdler32` intrinsic. Reference implementation for AArch64 can be found [here](https://github.com/openjdk/jdk9/blob/master/hotspot/src/cpu/aarch64/vm/stubGenerator_aarch64.cpp#L3281). >> >> ### Correctness checks >> >> Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok. All tier1 also passed. >> >> ### Performance results on T-Head board >> >> Enabled intrinsic: >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- | >> | Adler32.TestAdler32.testAdler32Update | 64 | thrpt | 25 | 5522.693 | 23.387 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 128 | thrpt | 25 | 3430.761 | 9.210 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 256 | thrpt | 25 | 1962.888 | 5.323 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 512 | thrpt | 25 | 1050.938 | 0.144 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 1024 | thrpt | 25 | 549.227 | 0.375 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 2048 | thrpt | 25 | 280.829 | 0.170 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 5012 | thrpt | 25 | 116.333 | 0.057 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 8192 | thrpt | 25 | 71.392 | 0.060 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 16384 | thrpt | 25 | 35.784 | 0.019 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 32768 | thrpt | 25 | 17.924 | 0.010 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 65536 | thrpt | 25 | 8.940 | 0.003 | ops/ms | >> >> Disabled intrinsic: >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- | >> |Adler32.TestAdler32.testAdler32Update|64|thrpt|25|655.633|5.845|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|128|thrpt|25|587.418|10.062|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|256|thrpt|25|546.675|11.598|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|512|thrpt|25|432.328|11.517|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|1024|thrpt|25|311.771|4.238|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|2048|thrpt|25|202.648|2.486|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|5012|thrpt|... > > ArsenyBochkarev has updated the pull request incrementally with eight additional commits since the last revision: > > - Prettify L_nmax loop > - Add comments in functions > - Add explanation comment for L_nmax_loop > - Fix L_nmax_loop for big lengths > - Fix L_by16 loop step > - Prettify intrinsic > - Use LMUL=4 for most of the calculations > - Use LMUL to load multiple data in one step Let me please repost the table in a more eye-friendly way Benchmark | (count) | Mode | Cnt | Score Enabled | Score Disabled | Units -- | -- | -- | -- | -- | -- | -- Adler32.TestAdler32.testAdler32Update | 64 | thrpt | 25 | 7244.611 | 1319.132 | ops/ms Adler32.TestAdler32.testAdler32Update | 128 | thrpt | 25 | 4679.629 | 1240.402 | ops/ms Adler32.TestAdler32.testAdler32Update | 256 | thrpt | 25 | 2740.242 | 1106.121 | ops/ms Adler32.TestAdler32.testAdler32Update | 512 | thrpt | 25 | 1509.818 | 905.468 | ops/ms Adler32.TestAdler32.testAdler32Update | 1024 | thrpt | 25 | 791.004 | 684.968 | ops/ms Adler32.TestAdler32.testAdler32Update | 2048 | thrpt | 25 | 406.103 | 451.938 | ops/ms Adler32.TestAdler32.testAdler32Update | 5012 | thrpt | 25 | 167.894 | 228.727 | ops/ms Adler32.TestAdler32.testAdler32Update | 8192 | thrpt | 25 | 171.731 | 150.421 | ops/ms Adler32.TestAdler32.testAdler32Update | 16384 | thrpt | 25 | 86.127 | 79.323 | ops/ms Adler32.TestAdler32.testAdler32Update | 32768 | thrpt | 25 | 48.468 | 40.986 | ops/ms Adler32.TestAdler32.testAdler32Update | 65536 | thrpt | 25 | 23.818 | 19.969 | ops/ms ------------- PR Comment: https://git.openjdk.org/jdk/pull/18382#issuecomment-2116065601 From sgibbons at openjdk.org Thu May 16 20:57:12 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 16 May 2024 20:57:12 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 17:25:04 GMT, Volodymyr Paprotski wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Rearrange; add lambdas for clarity > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4492: > >> 4490: >> 4491: // Compare char[] or byte[] arrays aligned to 4 bytes or substrings. >> 4492: void C2_MacroAssembler::arrays_equals(bool is_array_equ, Register ary1, > > I liked the old style better, fewer longer lines.. same for rest of the changes in this file. Done. > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4594: > >> 4592: #endif //_LP64 >> 4593: bind(COMPARE_WIDE_VECTORS); >> 4594: vmovdqu(vec1, Address(ary1, limit, > > create a local scale variable instead of ternary operators. Used several times. Done > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4250: > >> 4248: generate_chacha_stubs(); >> 4249: >> 4250: if ((UseAVX == 2) && EnableX86ECoreOpts && VM_Version::supports_avx2()) { > > Just `if (EnableX86ECoreOpts)`? I think all 3 should be specified, even if `EnableX86ECoreOpts` checks. This is for clarity of intent. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 391: > >> 389: } >> 390: >> 391: __ cmpq(needle_len, isU ? 2 : 1); > > Can we remove this comparison? i.e. > - broadcast first and last character unconditionally (same character). Or > - move broadcasts 'down' into individual cases.. > There is already specialized code to handle needle of size 1.. This adds extra pathlength. (Will we actually call this intrinsic for needle_size==1? Assume length>=2?) At this point in the code it is entirely possible for needle size to be == 1, but only in the case where haystack size is > 32 bytes. Moving the broadcasts 'down' into individual cases increases code size by 14 broadcast instructions. Seems like the best option is to just remove the compare and branch, broadcasting the first needle element twice. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1365: > >> 1363: // Compare first byte of needle to haystack >> 1364: vpcmpeq(cmp_0, byte_0, Address(haystack, 0), Assembler::AVX_256bit); >> 1365: if (size != (isU ? 2 : 1)) { > > `if (size != scale)` > > Though in this case, `elem_size` might hold more meaning. Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1372: > >> 1370: >> 1371: if (bytesToCompare > 2) { >> 1372: if (size > (isU ? 4 : 2)) { > > `if (size > 2*scale)`? Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1373: > >> 1371: if (bytesToCompare > 2) { >> 1372: if (size > (isU ? 4 : 2)) { >> 1373: if (doEarlyBailout) { > > Is there a big perf difference when `doEarlyBailout` is enabled? And/or just for this function? > > (i.e. removing `doEarlyBailout` in this function will mean less pathlength. Feels like a few extra vpands should be cheap enough.) I removed the macro DO_EARLY_BAILOUT and assumed it to be true. There's not much difference (if any) in performance, so we maybe ought to consider not bailing out early. I'll consider not bailing out and let you know how performance is impacted. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1469: > >> 1467: >> 1468: if (isU && (size & 1)) { >> 1469: __ emit_int8(0xcc); > > This should also be an `assert()` to catch this at compile-time. Although assert is technically runtime (;-)) I'll change these. They were put in to double-check while debugging. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1633: > >> 1631: if (isU) { >> 1632: if ((size & 1) != 0) { >> 1633: __ emit_int8(0xcc); > > Compile-time assert to ensure this code is never called instead? Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1889: > >> 1887: // r13 = (needle length - 1) >> 1888: // r14 = &needle >> 1889: // r15 = unused > > There is quite a bit of redundancy in register usage. Its not incorrect, but looks odd. Not clear if this duplication can easily be removed (or if/why needed). > > // rbx = &haystack > // rdi = &haystack > // rdx = &needle > // r14 = &needle > // rcx = haystack length > // rsi = haystack length > // r12 = needle length > // r13 = (needle length - 1) > // r10 = hs_len - needle len > // rbp = -1 > > // rax = unused > // r11 = unused > // r8 = unused > // r9 = unused > // r15 = unused > > > (Could this comment be out-of-sync with the code? Looks like only rbx, r14 and temps out of unused registers are used few lines down) This comment provides the full register state upon entry to each of the cases of the switch. The duplication is an artifact of the decisions made in setup code (like checking ranges, etc.). Each case can depend on the values of the registers to be as documented on entry. It can use either rcx or rsi to get the haystack length, for example. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1950: > >> 1948: // r13 = (needle length - 1) >> 1949: // r14 = &needle >> 1950: // r15 = unused > > Same as for the small case Yes, same as for the small case. > test/micro/org/openjdk/bench/java/lang/StringIndexOfHuge.java line 2: > >> 1: /* >> 2: * Copyright (c) 2014, 2024, Oracle and/or its affiliates. All rights reserved. > > New file, just 2024 Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603734868 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603735274 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603737342 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603806354 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603953047 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603985462 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603955117 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603956554 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603989550 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1604006660 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1604006994 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1604024770 From sgibbons at openjdk.org Thu May 16 20:57:20 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 16 May 2024 20:57:20 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: <8Y-nIHc8vfB1X_hp3tpqqqgpCzu6dAt6BBIP_zc4Q70=.c9a48c68-8c14-4af9-8357-ab50e62a5fd3@github.com> References: <8Y-nIHc8vfB1X_hp3tpqqqgpCzu6dAt6BBIP_zc4Q70=.c9a48c68-8c14-4af9-8357-ab50e62a5fd3@github.com> Message-ID: On Mon, 6 May 2024 20:56:36 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Rearrange; add lambdas for clarity > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 1174: > >> 1172: // Alignment specifying the maximum number of allowed bytes to pad. >> 1173: // If padding > max, no padding is inserted. >> 1174: void MacroAssembler::p2align(int modulus, int maxbytes) { > > We could pass offset() as an argument to p2align. Basically have three arguments to p2align(modulus, target, maxbytes). Also maybe rename p2align as align then? Removed p2align(). Was never used and was a remnant of prior implementation attempt. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 208: > >> 206: //////////////////////////////////////////////////////////////////////////////////////// >> 207: //////////////////////////////////////////////////////////////////////////////////////// >> 208: if (VM_Version::supports_avx2()) { // AVX2 version > > Instead of the if check here, it would be better to do an assert here: > assert (VM_Version::supports_avx2(), "Needs AVX2 support"); Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 238: > >> 236: const Register needle = rdx; >> 237: const Register needle_len = rcx; >> 238: > > This is the calling convention on Linux. How is windows platform handled? The entry code switches Windows calling convention into Linux calling convention by moving/saving registers, which are properly restored on function exit. This makes register tracking easier. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 260: > >> 258: // const XMMRegister save_rcx = xmm11; >> 259: // const XMMRegister save_r8 = xmm12; >> 260: > > This could be removed? Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 279: > >> 277: fnptrs[isLL ? StrIntrinsicNode::LL >> 278: : isUU ? StrIntrinsicNode::UU >> 279: : StrIntrinsicNode::UL] = __ pc(); > > Could this not be simplified as: > fnptrs[ae] = __ pc(); Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 314: > >> 312: >> 313: // needle_len is in elements, not bytes, for UTF-16 >> 314: __ cmpq(needle_len, isUU ? OPT_NEEDLE_SIZE_MAX / 2 : OPT_NEEDLE_SIZE_MAX); > > OPT_NEEDLE_SIZE_MAX is an odd number (set to 5), should that have been an even number? Removed OPT_NEEDLE_SIZE_MAX and replaced with constant == 6. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 383: > >> 381: { >> 382: Label L_short; >> 383: > > A comment here: > // Broadcast the beginning of needle into a vector register. Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 390: > >> 388: __ vpbroadcastb(byte_0, Address(needle, 0), Assembler::AVX_256bit); >> 389: } >> 390: > > A comment here: > // Broadcast the end of needle into a vector register. This step is not needed for single element needle. Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 418: > >> 416: __ cmpq(haystack_len, 0x10); >> 417: __ ja_b(L_moreThan16); >> 418: > > An assert here to check for header size >= 16 would be good. > Also a comment here would he good, something like: > // Copy 16 or 32 bytes prior to haystack end onto stack > // This will possibly including some object header bytes when haystack length is less than 16 or 32 bytes // Set the new haystack address to beginning of copied haystack on stack adjusting for extra bytes copied I don't know how to assert header size >= 16 bytes, so I'll add a comment stating such. If you can tell me how to assert, I'll add that code in place of the comment. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 498: > >> 496: >> 497: // big_case_loop_helper will fall through to this point if one or more potential matches are found >> 498: // The mask will have a bitmask indicating the position of the potential matches within the haystack > > If no potential match, which label does the big_case_loop_helper jump to? Added comment > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 517: > >> 515: __C2 arrays_equals(false, haystackStart, firstNeedleCompare, compLen, retval, rScratch, xmm_tmp3, xmm_tmp4, >> 516: false /* char */, knoreg); >> 517: __ testl(retval, retval); > > Since this is byte compare even for isU, the retval here could be a 64-bit quantity so the testl should be a testq. `arrays_equals` returns a boolean value of `0` for not found and `1` for found using `movl(result, 0/1)` so testl is appropriate here. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 553: > >> 551: // Haystack always copied to stack, so 32-byte reads OK >> 552: // Haystack length < 32 >> 553: // 10 < needle length < 32 > > The comment below may need update as we come here for needle_len > OPT_NEEDLE_SIZE_MAX which is currently set as 5: > // 10 < needle length < 32 No. The jump is based on NUMBER_OF_CASES which is == 10. See line 147. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 576: > >> 574: broadcast_additional_needles(false, 0 /* unknown */, NUMBER_OF_NEEDLE_BYTES_TO_COMPARE, needle, needleLen, rTmp3, >> 575: isUU, isUL, _masm); >> 576: > > Good to pass output xmm registers to this method. Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 587: > >> 585: // firstNeedleCompare has address of second element of needle >> 586: // compLen has length of comparison to do >> 587: > > This is not clear. firstNeedleCompare gets needle + NUMBER_OF_NEEDLE_BYTES_TO_COMPARE - 1 which is not necessarily the second element of needle. If it helps let us fix the NUMBER_OF_NEEDLE_BYTES_TO_COMPARE to 3 and have comments and code versus that only. Replaced NUMBER_OF_NEEDLE_BYTES_TO_COMPARE with constant `3` > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 611: > >> 609: __C2 arrays_equals(false, rTmp, firstNeedleCompare, compLen, rTmp3, rTmp2, xmm_tmp3, xmm_tmp4, false /* char */, >> 610: knoreg); >> 611: __ testl(rTmp3, rTmp3); > > Since this is byte compare even for isU, the rtmp3 here could be a 64-bit quantity so the testl should be a testq. `arrays_equals` returns boolean via `movl(retval, 0/1)` so testl is appropriate here. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 629: > >> 627: >> 628: __ bind(L_returnError); >> 629: __ movq(rbp, -1); > > This could directly be rax instead of intermediate rbp and then moving from rbp to rax. Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 633: > >> 631: >> 632: __ bind(L_returnZero); >> 633: __ xorl(rbp, rbp); > > This could directly be rax instead of intermediate rbp and then moving from rbp to rax. Removed block - never jumped to. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 639: > >> 637: __ movl(rax, r8); >> 638: __ subq(rcx, rbx); >> 639: __ addq(rcx, rax); > > This could be: > __ subq(rcx, rbx); > __ addq(rcx, r8); Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 647: > >> 645: __ cmpq(r11, r10); >> 646: __ movq(rbp, -1); >> 647: __ cmovq(Assembler::belowEqual, rbp, r11); > > This could be directly computed in rax: > __ movq(rax, -1); > __ cmovq(Assembler::belowEqual, rax, r11); > Also is it possible to not do cmov on some paths? It is an expensive operation. OK > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1010: > >> 1008: static void broadcast_additional_needles(bool sizeKnown, int size, int bytesToCompare, Register needle, >> 1009: Register needleLen, Register rTmp, bool isUU, bool isUL, >> 1010: MacroAssembler *_masm) { > > Good to add output XMM registers to the parameter list. Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1040: > >> 1038: __ vpbroadcastb(byte_1, Address(needle, 1), Assembler::AVX_256bit); >> 1039: } >> 1040: } > > It will be good to have a function which broadcasts a needle element from a given offset into a vector register. > That function could take (needle address, offset, outout vector register, temps). > Such a function could then be called twice from here and from main function for offset 0. No longer relevant - always comparing 3 needle bytes only, so the second broadcast is gone. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1054: > >> 1052: } else if (isUL) { >> 1053: __ movzbl(rTmp, Address(needle, 2)); >> 1054: __ movdl(byte_1, rTmp); > > Should be: __ movdl(byte_2, rTmp); Removed byte_2 - always comparing 3 bytes. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1056: > >> 1054: __ movdl(byte_1, rTmp); >> 1055: // 1st byte of needle in words >> 1056: __ vpbroadcastw(byte_1, byte_1, Assembler::AVX_256bit); > > Should be: > __ vpbroadcastw(byte_2, byte_2, Assembler::AVX_256bit); Removed byte_2 - always comparing 3 bytes. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1082: > >> 1080: // noMatch - label bound outside to jump to if there is no match >> 1081: // haystack - the address of the first byte of the haystack >> 1082: // hsLen - the sizeof the haystack > > Good to specify if the size (size of needle) and hsLen (size of haystack) is in bytes or elements. In bytes. added > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1083: > >> 1081: // haystack - the address of the first byte of the haystack >> 1082: // hsLen - the sizeof the haystack >> 1083: // isU - true if argument encoding is either UU or UL > > We need to list needleLen here as well? Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1096: > >> 1094: MacroAssembler *_masm) { >> 1095: >> 1096: assert_different_registers(eq_mask, haystack, needleLen, rTmp, hsLen, r10); > > r10 kind of stands out here. You could say nMinusK in this assert. > The assert following to this one is checking for nMinusK==r10 so that should suffice. > BTW, didn't see anything in the code below that needs nMinuxK to be r10. r10 holds the value `(n - k)` always, which is used to ensure the returned index is not past the end of the haystack. I will annotate this register as global in comments. I also reserve xmm0, xmm1, and xmm12 to hold the broadcasted needle bytes globally. I'll try to make this as obvious as possible. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1120: > >> 1118: #define cmp_0 XMM_TMP3 >> 1119: #undef cmp_k >> 1120: #define cmp_k XMM_TMP4 > > XMM_TMP4 is not reused so cmp_k could be declared as const. In general limiting undef/define pair only to reused registers would make the review easier. OK. I'll handle this as a last pass over the code for register allocation. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1125: > >> 1123: #undef lastMask >> 1124: >> 1125: int sizeIncr = isU ? 2 : 1; > > sizeIncr and scale seems to be same, we could just use one of them in this function. Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1149: > >> 1147: >> 1148: if (size == (isU ? 2 : 1)) { >> 1149: __ vpmovmskb(eq_mask, cmp_0, Assembler::AVX_256bit); > > vpmovmskb is being done twice if doEarlyBailout is set to 1 (the setting we have currently). > If it helps to simplify, we could assume that doEarlyBailout is always set to 1 and remove this configurability. Fixed with removal of DO_EARLY_BAILOUT > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1174: > >> 1172: #define lastMask rTmp >> 1173: __ vpmovmskb(lastMask, cmp_k, Assembler::AVX_256bit); >> 1174: __ shrq(lastMask); > > did you mean to shift the lastMask by shiftVal here? The whole machination around saving/restoring rcx here was to shift by cl. The code emitted by this instruction is: `0x00007fffe463d048: 48 d3 ea shr rdx,cl` which is what is desired. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1183: > >> 1181: >> 1182: if (bytesToCompare > 2) { >> 1183: if (size > (isU ? 4 : 2)) { > > this and other usages could be simplified to: size > 2 * scale Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1185: > >> 1183: if (size > (isU ? 4 : 2)) { >> 1184: if (doEarlyBailout) { >> 1185: __ testl(eq_mask, eq_mask); > > The masks are 32 bit as we are comparing max 32 byes (256 bits) at a time. So we could consistently do either andl, testl, shrl or andq, testq, shrq. Changed to `l` variant > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1476: > >> 1474: _masm); >> 1475: >> 1476: __ movq(r11, -1); > > There doesn't seem to be a use of r11 below in this function. r11 is used in exit code as the pointer to the haystack byte that matches. Setting to `-1` will always be past the end of any haystack and return an error. The helper after this call makes that assumption. This is another of the "pseudo-global" registers. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1493: > >> 1491: // Assume r10 is n - k >> 1492: __ leaq(last, Address(haystack, r10, Address::times_1, isU ? -30 : -31)); >> 1493: __ jmpb(temp); > > Need to pass r10 as parameter. Also temp label could be given a better name. Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1502: > >> 1500: >> 1501: __ cmpq(hsPtrRet, last); >> 1502: __ cmovq(Assembler::aboveEqual, hsPtrRet, last); > > cmovq is expensive, better sequence would be: > > __ cmpq(hsPtrRet, last); > __ jb_b(temp); > __ movq(hsPtrRet, last); Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1510: > >> 1508: compare_big_haystack_to_needle(sizeKnown, size, NUMBER_OF_NEEDLE_BYTES_TO_COMPARE, loop_top, hsPtrRet, hsLength, >> 1509: needleLen, isU, DO_EARLY_BAILOUT, eq_mask, temp2, r10, _masm); >> 1510: > > At this point hsLength is not the remaining length from hsPtrRet, would that cause a problem? If not, all the special paths in compare_big_haystack_to_needle need not be generated on this call. Not sure what you mean here. I *think* you mean that hsLength is not the length of the remaining bytes in the haystack, but the actual length. There may be an issue if that is correct, right? I'll investigate. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1589: > >> 1587: case 3: >> 1588: case 4: >> 1589: __ movl(needleVal, Address(needle, offsetOfFirstByteToCompare)); > > If the size of the needle is 7 and it is an LL case with NUMBER_OF_NEEDLE_BYTES_TO_COMPARE set as 3: > bytesLeftToCompare = 4 (i.e. 7-3); > offsetOfFirstByteToCompare = 2 (i.e. 3-1); > the movl will be loading bytes 2,3,4,5 > So we seem to be missing loading the last byte of the needle. Is that correct? Bytes 0, 1, and 6 have already compared equal before getting to this code, so it is correct functionally. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1735: > >> 1733: // generated with 32 - (n - k + 1) bits set that ensures matches past the end of the original >> 1734: // haystack do not get considered during compares. >> 1735: // > > Mask is generated below with (n-k+1) bits set and not 32- (n-k+1) bits set. Also it will be helpful if we specify what is n and k. Thanks. Fixed. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1838: > >> 1836: __ shrq(rax, 1); >> 1837: } >> 1838: > > We need to be consistent either use tzcntl, shrl, testl or tzcntq, shrq, testq. I'll search through the code making them all consistent. > src/hotspot/share/opto/library_call.cpp line 1263: > >> 1261: if (result != nullptr) { >> 1262: // The result is index relative to from_index if substring was found, -1 otherwise. >> 1263: // Generate code which will fold into cmove. > > Any reason to remove this comment? No reason - cut/paste error. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603736399 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603740677 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603743601 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603752052 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603752276 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603752936 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603780784 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603780997 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603816022 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603833467 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603846748 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603855986 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603864665 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603865621 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603866807 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603868917 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603869305 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603884368 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603889410 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603895505 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603896809 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603897475 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603897759 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603903738 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603906289 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603914822 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603917518 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603922652 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603924998 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603939571 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603949004 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603951974 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603966864 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603974757 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603969211 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603985006 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603989357 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603990826 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603999938 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1604012121 From sgibbons at openjdk.org Thu May 16 20:57:25 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 16 May 2024 20:57:25 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v7] In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 12:09:11 GMT, Jatin Bhateja wrote: >> Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: >> >> - Merge branch 'openjdk:master' into indexof >> - Merge branch 'openjdk:master' into indexof >> - Addressing review comments. >> - Fix for JDK-8321599 >> - Support UU IndexOf >> - Only use optimization when EnableX86ECoreOpts is true >> - Fix whitespace >> - Merge branch 'openjdk:master' into indexof >> - Comments; added exhaustive-ish test >> - Subtracting 0x10 twice. >> - ... and 12 more: https://git.openjdk.org/jdk/compare/8e12053e...3e58d0c2 > > src/hotspot/share/opto/library_call.cpp line 1229: > >> 1227: } else { >> 1228: result = make_indexOf_node(src_start, src_count, tgt_start, tgt_count, >> 1229: result_rgn, result_phi, ae); > > Existing routines emits IR to handle following special cases. > > tgt_cnt > src_cnt return -1 > tgt_cnt == 0 return 0. > > Should we not be preserving those check before calling stub ? > > As of now these checks are part of stub and doing them in JIT code will save call overhead. Working on this. Trying to develop my IR chops. However, this is optimizing for a very small percentage of calls, so there will be unnoticable effect on overall performance. There will only be savings for calls that have needle length == 0 (probably zero calls do this) or haystack length < needle length (maybe, but highly unlikely). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1604010493 From pminborg at openjdk.org Fri May 17 06:27:11 2024 From: pminborg at openjdk.org (Per Minborg) Date: Fri, 17 May 2024 06:27:11 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v16] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: <79iymLHD3Dl6-yQhDTt9LTHMz0AwZWPy43HZLGk2KA0=.34a3901e-0db4-4421-9c8b-090dafa18abd@github.com> On Thu, 16 May 2024 12:48:24 GMT, Per Minborg wrote: >> # Stable Values & Collections (Internal) >> >> ## Summary >> This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. >> >> ## Goals >> * Provide an easy and intuitive API to describe value holders that can change at most once. >> * Decouple declaration from initialization without significant footprint or performance penalties. >> * Reduce the amount of static initializer and/or field initialization code. >> * Uphold integrity and consistency, even in a multi-threaded environment. >> >> For more details, see the draft JEP: https://openjdk.org/jeps/8312611 >> >> ## Performance >> Performance compared to instance variables using two `AtomicReference` and two protected by double-checked locking under concurrent access by all threads: >> >> >> Benchmark Mode Cnt Score Error Units >> StableBenchmark.atomic thrpt 10 259.478 ? 36.809 ops/us >> StableBenchmark.dcl thrpt 10 225.710 ? 26.638 ops/us >> StableBenchmark.stable thrpt 10 4382.478 ? 1151.472 ops/us <- StableValue significantly faster >> >> >> Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (all threads): >> >> >> Benchmark Mode Cnt Score Error Units >> StableStaticBenchmark.atomic thrpt 10 6487.835 ? 385.639 ops/us >> StableStaticBenchmark.dcl thrpt 10 6605.239 ? 210.610 ops/us >> StableStaticBenchmark.stable thrpt 10 14338.239 ? 1426.874 ops/us >> StableStaticBenchmark.staticCHI thrpt 10 13780.341 ? 1839.651 ops/us >> >> >> Performance for stable lists (thread safe) in both instance and static contexts whereby we access a single value compared to `ArrayList` instances (which are not thread-safe) (all threads): >> >> >> Benchmark Mode Cnt Score Error Units >> StableListElementBenchmark.instanceArrayList thrpt 10 5812.992 ? 1169.730 ops/us >> StableListElementBenchmark.instanceList thrpt 10 4818.643 ? 704.893 ops/us >> StableListElementBenchmark... > > Per Minborg has updated the pull request incrementally with one additional commit since the last revision: > > Fix copyringht issues > _Mailing list message from [Olexandr Rotan](mailto:rotanolexandr842 at gmail.com) on [compiler-dev](mailto:compiler-dev at mail.openjdk.org):_ > > Is it possible to make stable values and collections Serializable? I see various applications for this feature in entity classes as a way to preserve immutability of entity fields and at the same time not break current JPA specifications. It is a *very* common task in commercial development. Current workarounds lack either thread safety or performance, and this looks like a really good solution for both of those problems. However, unless StableValue is serializable, it is really unlikely that JPA will adopt them (we have precedent with Optional). > > On Thu, May 16, 2024 at 5:07?PM Per Minborg wrote: > > -------------- next part -------------- An HTML attachment was scrubbed... URL: `Serializable` is on the list to explore. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18794#issuecomment-2116840097 From mli at openjdk.org Fri May 17 06:37:06 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 17 May 2024 06:37:06 GMT Subject: RFR: 8332394: Add friendly output when @IR rule missing value [v2] In-Reply-To: References: Message-ID: On Thu, 16 May 2024 14:36:49 GMT, Christian Hagedorn wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> add more information in output > > Good catch! Only a small improvement suggestion, otherwise, looks good. > > Just noticed that we are actually missing tests that trigger a format violation in `TestBadFormat` for `applyIfCPUFeature*` and `applyIfPlatform*`. We should probably add some at some point, analogously to the ones already there for `applyIf*` for flags. But that could be done separately. Thanks @chhagedorn for your reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19270#issuecomment-2116849535 From mli at openjdk.org Fri May 17 06:37:06 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 17 May 2024 06:37:06 GMT Subject: Integrated: 8332394: Add friendly output when @IR rule missing value In-Reply-To: References: Message-ID: <_zvy2Yk8yiNRkka4PfJP_ZRV-n7KMdH4rdrDN4eQL88=.95b47cb8-e0d6-4550-b8bb-3713a1e2177d@github.com> On Thu, 16 May 2024 13:59:16 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Currently, when a @IR rule like "applyIfPlatform" or "applyIfCPUFeature" miss a value, it will just throw ArrayIndexOutOfBoundsException, with no other information. This is confusing unless you dig into the test frame code. > It's helpful to output more meaningful information. > Thanks This pull request has now been integrated. Changeset: 6422efa3 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/6422efa3c7917525a879e80657ca4dcfb6d67514 Stats: 14 lines in 1 file changed: 4 ins; 0 del; 10 mod 8332394: Add friendly output when @IR rule missing value Reviewed-by: chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/19270 From pminborg at openjdk.org Fri May 17 06:41:37 2024 From: pminborg at openjdk.org (Per Minborg) Date: Fri, 17 May 2024 06:41:37 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v17] In-Reply-To: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: > # Stable Values & Collections (Internal) > > ## Summary > This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. > > ## Goals > * Provide an easy and intuitive API to describe value holders that can change at most once. > * Decouple declaration from initialization without significant footprint or performance penalties. > * Reduce the amount of static initializer and/or field initialization code. > * Uphold integrity and consistency, even in a multi-threaded environment. > > For more details, see the draft JEP: https://openjdk.org/jeps/8312611 > > ## Performance > Performance compared to instance variables using two `AtomicReference` and two protected by double-checked locking under concurrent access by all threads: > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.atomic thrpt 10 259.478 ? 36.809 ops/us > StableBenchmark.dcl thrpt 10 225.710 ? 26.638 ops/us > StableBenchmark.stable thrpt 10 4382.478 ? 1151.472 ops/us <- StableValue significantly faster > > > Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (all threads): > > > Benchmark Mode Cnt Score Error Units > StableStaticBenchmark.atomic thrpt 10 6487.835 ? 385.639 ops/us > StableStaticBenchmark.dcl thrpt 10 6605.239 ? 210.610 ops/us > StableStaticBenchmark.stable thrpt 10 14338.239 ? 1426.874 ops/us > StableStaticBenchmark.staticCHI thrpt 10 13780.341 ? 1839.651 ops/us > > > Performance for stable lists (thread safe) in both instance and static contexts whereby we access a single value compared to `ArrayList` instances (which are not thread-safe) (all threads): > > > Benchmark Mode Cnt Score Error Units > StableListElementBenchmark.instanceArrayList thrpt 10 5812.992 ? 1169.730 ops/us > StableListElementBenchmark.instanceList thrpt 10 4818.643 ? 704.893 ops/us > StableListElementBenchmark.staticArrayList thrpt 10 7614.741 ? 564.777 ops/us > StableListElementBe... Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Declare field final ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18794/files - new: https://git.openjdk.org/jdk/pull/18794/files/ec7c92cd..35c9252d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=15-16 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18794.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18794/head:pull/18794 PR: https://git.openjdk.org/jdk/pull/18794 From pminborg at openjdk.org Fri May 17 07:31:39 2024 From: pminborg at openjdk.org (Per Minborg) Date: Fri, 17 May 2024 07:31:39 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v18] In-Reply-To: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: > # Stable Values & Collections (Internal) > > ## Summary > This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. > > ## Goals > * Provide an easy and intuitive API to describe value holders that can change at most once. > * Decouple declaration from initialization without significant footprint or performance penalties. > * Reduce the amount of static initializer and/or field initialization code. > * Uphold integrity and consistency, even in a multi-threaded environment. > > For more details, see the draft JEP: https://openjdk.org/jeps/8312611 > > ## Performance > Performance compared to instance variables using two `AtomicReference` and two protected by double-checked locking under concurrent access by all threads: > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.atomic thrpt 10 259.478 ? 36.809 ops/us > StableBenchmark.dcl thrpt 10 225.710 ? 26.638 ops/us > StableBenchmark.stable thrpt 10 4382.478 ? 1151.472 ops/us <- StableValue significantly faster > > > Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (all threads): > > > Benchmark Mode Cnt Score Error Units > StableStaticBenchmark.atomic thrpt 10 6487.835 ? 385.639 ops/us > StableStaticBenchmark.dcl thrpt 10 6605.239 ? 210.610 ops/us > StableStaticBenchmark.stable thrpt 10 14338.239 ? 1426.874 ops/us > StableStaticBenchmark.staticCHI thrpt 10 13780.341 ? 1839.651 ops/us > > > Performance for stable lists (thread safe) in both instance and static contexts whereby we access a single value compared to `ArrayList` instances (which are not thread-safe) (all threads): > > > Benchmark Mode Cnt Score Error Units > StableListElementBenchmark.instanceArrayList thrpt 10 5812.992 ? 1169.730 ops/us > StableListElementBenchmark.instanceList thrpt 10 4818.643 ? 704.893 ops/us > StableListElementBenchmark.staticArrayList thrpt 10 7614.741 ? 564.777 ops/us > StableListElementBe... Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Simplify StableList ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18794/files - new: https://git.openjdk.org/jdk/pull/18794/files/35c9252d..0f798a70 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=16-17 Stats: 8 lines in 1 file changed: 0 ins; 4 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18794.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18794/head:pull/18794 PR: https://git.openjdk.org/jdk/pull/18794 From chagedorn at openjdk.org Fri May 17 07:36:06 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 17 May 2024 07:36:06 GMT Subject: RFR: 8332394: Add friendly output when @IR rule missing value [v2] In-Reply-To: References: Message-ID: On Thu, 16 May 2024 16:04:59 GMT, Hamlin Li wrote: >> test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java line 293: >> >>> 291: String platform = andRules[i].trim(); >>> 292: i++; >>> 293: TestFormat.check(i < andRules.length, "Missing value for platform " + platform + failAt()); >> >> I suggest to also add the `ruleType` as in `hasAllRequiredFlags()`, for example. Then it is even more precise. For even more readability you could add some `""`: >> Current: >> >> Missing value for platform xyz in @IR rule 1 at foo() >> >> vs. >> Improved: >> >> Missing value for platform "xyz" in @IR rule 1 in "applyIfPlatform" at foo() > > Thanks, it makes sense. > > I also created https://bugs.openjdk.org/browse/JDK-8332402 to track adding tests in `TestBadFormat` Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19270#discussion_r1604478783 From bkilambi at openjdk.org Fri May 17 07:45:10 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 17 May 2024 07:45:10 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v9] In-Reply-To: References: Message-ID: On Thu, 16 May 2024 10:08:55 GMT, Bhavana Kilambi wrote: >> Floating-point addition is non-associative, that is adding floating-point elements in arbitrary order may get different value. Specially, Vector API does not define the order of reduction intentionally, which allows platforms to generate more efficient codes [1]. So that needs a node to represent non strictly-ordered add-reduction for floating-point type in C2. >> >> To avoid introducing new nodes, this patch adds a bool field in `AddReductionVF/D` to distinguish whether they require strict order. It also removes `UnorderedReductionNode` and adds a virtual function `bool requires_strict_order()` in `ReductionNode`. Besides `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` have a fixed value. >> >> With this patch, Vector API would always generate non strictly-ordered `AddReductionVF/D' on SVE machines with vector length <= 16B as it is more beneficial to generate non-strictly ordered instructions on such machines compared to strictly ordered ones. >> >> [AArch64] >> On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. Auto-vectorization has already banned these nodes in JDK-8275275 [2]. >> >> This patch adds matching rules for non strictly-ordered `AddReductionVF/D`. >> >> No effects on other platforms. >> >> [Performance] >> FloatMaxVector.ADDLanes [3] measures the performance of add reduction for floating-point type. With this patch, it improves ~3x on my SVE machine (128-bit). >> >> ADDLanes >> >> Benchmark Before After Unit >> FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms >> >> >> Final code is as below: >> >> Before: >> ` fadda z17.s, p7/m, z17.s, z16.s >> ` >> After: >> >> faddp v17.4s, v21.4s, v21.4s >> faddp s18, v17.2s >> fadd s18, s18, s19 >> >> >> >> >> [Test] >> Full jtreg passed on AArch64 and x86. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2529 >> [2] https://bugs.openjdk.org/browse/JDK-8275275 >> [3] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L316 > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Add dump_spec and JTREG IR tests for Add/Mul Reduction Nodes test/hotspot/jtreg/compiler/c2/irTests/TestVectorFPReduction.java line 54: > 52: > 53: @Test > 54: @IR(applyIf = {"UseSVE", "0"}, failOn = {IRNode.ADD_REDUCTION_VF}) I realize that this might fail when extended on other architectures. I will modify the rules and add `applyIfCPUFeature` instead of testing the VM flags directly (which can be modified and result in incorrect results). Will update a PS soon. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1604488606 From roland at openjdk.org Fri May 17 07:54:09 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 17 May 2024 07:54:09 GMT Subject: RFR: 8332369: C2: assert(false) failed: graph should be schedulable after JDK-8324517 Message-ID: The issue occurs when a `Mod` node is processed during final_graph_reshaping: if a `Div` node is found with the same inputs, the `Mod` is replaced either by a `DivMod` node or a subgraph that has the `Div` node as input. Finding the `Div` node is done `find_similar()` which ignores the precedence edges. What happens is that the `Div` node returned by `find_similar()` could have a precedence edge that pins it at a control that doesn't dominate the control of some of the uses of the `Mod` node. The fix I propose is to simply not perfom the transformation if one of the nodes has precedence edges (which should be a rare corner case). ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/19277/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19277&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332369 Stats: 29 lines in 4 files changed: 1 ins; 20 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/19277.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19277/head:pull/19277 PR: https://git.openjdk.org/jdk/pull/19277 From pminborg at openjdk.org Fri May 17 08:13:34 2024 From: pminborg at openjdk.org (Per Minborg) Date: Fri, 17 May 2024 08:13:34 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v19] In-Reply-To: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: > # Stable Values & Collections (Internal) > > ## Summary > This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. > > ## Goals > * Provide an easy and intuitive API to describe value holders that can change at most once. > * Decouple declaration from initialization without significant footprint or performance penalties. > * Reduce the amount of static initializer and/or field initialization code. > * Uphold integrity and consistency, even in a multi-threaded environment. > > For more details, see the draft JEP: https://openjdk.org/jeps/8312611 > > ## Performance > Performance compared to instance variables using two `AtomicReference` and two protected by double-checked locking under concurrent access by all threads: > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.atomic thrpt 10 259.478 ? 36.809 ops/us > StableBenchmark.dcl thrpt 10 225.710 ? 26.638 ops/us > StableBenchmark.stable thrpt 10 4382.478 ? 1151.472 ops/us <- StableValue significantly faster > > > Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (all threads): > > > Benchmark Mode Cnt Score Error Units > StableStaticBenchmark.atomic thrpt 10 6487.835 ? 385.639 ops/us > StableStaticBenchmark.dcl thrpt 10 6605.239 ? 210.610 ops/us > StableStaticBenchmark.stable thrpt 10 14338.239 ? 1426.874 ops/us > StableStaticBenchmark.staticCHI thrpt 10 13780.341 ? 1839.651 ops/us > > > Performance for stable lists (thread safe) in both instance and static contexts whereby we access a single value compared to `ArrayList` instances (which are not thread-safe) (all threads): > > > Benchmark Mode Cnt Score Error Units > StableListElementBenchmark.instanceArrayList thrpt 10 5812.992 ? 1169.730 ops/us > StableListElementBenchmark.instanceList thrpt 10 4818.643 ? 704.893 ops/us > StableListElementBenchmark.staticArrayList thrpt 10 7614.741 ? 564.777 ops/us > StableListElementBe... Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Update ofList and ofMap docs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18794/files - new: https://git.openjdk.org/jdk/pull/18794/files/0f798a70..dd0ceaf0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=17-18 Stats: 12 lines in 1 file changed: 0 ins; 2 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/18794.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18794/head:pull/18794 PR: https://git.openjdk.org/jdk/pull/18794 From gcao at openjdk.org Fri May 17 08:56:09 2024 From: gcao at openjdk.org (Gui Cao) Date: Fri, 17 May 2024 08:56:09 GMT Subject: RFR: 8331281: RISC-V: C2: Support vector-scalar and vector-immediate bitwise logic instructions [v3] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 10:18:43 GMT, Feilong Jiang wrote: >> Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Merge remote-tracking branch 'upstream/master' into JDK-8331281 >> - Use iRegIorL2I to replace iRegI in AndV/OrVXorV instruct >> - Polishing Code comment >> - Add vand/vor/vxor predicated Node >> - Polishing Code Comment >> - 8331281: RISC-V: C2: Support vector-scalar and vector-immediate bitwise logic instructions > > Looks good, thanks! @feilongjiang @RealFYang : Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18999#issuecomment-2117070219 From pminborg at openjdk.org Fri May 17 09:31:33 2024 From: pminborg at openjdk.org (Per Minborg) Date: Fri, 17 May 2024 09:31:33 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v20] In-Reply-To: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: > # Stable Values & Collections (Internal) > > ## Summary > This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. > > ## Goals > * Provide an easy and intuitive API to describe value holders that can change at most once. > * Decouple declaration from initialization without significant footprint or performance penalties. > * Reduce the amount of static initializer and/or field initialization code. > * Uphold integrity and consistency, even in a multi-threaded environment. > > For more details, see the draft JEP: https://openjdk.org/jeps/8312611 > > ## Performance > Performance compared to instance variables using two `AtomicReference` and two protected by double-checked locking under concurrent access by all threads: > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.atomic thrpt 10 259.478 ? 36.809 ops/us > StableBenchmark.dcl thrpt 10 225.710 ? 26.638 ops/us > StableBenchmark.stable thrpt 10 4382.478 ? 1151.472 ops/us <- StableValue significantly faster > > > Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (all threads): > > > Benchmark Mode Cnt Score Error Units > StableStaticBenchmark.atomic thrpt 10 6487.835 ? 385.639 ops/us > StableStaticBenchmark.dcl thrpt 10 6605.239 ? 210.610 ops/us > StableStaticBenchmark.stable thrpt 10 14338.239 ? 1426.874 ops/us > StableStaticBenchmark.staticCHI thrpt 10 13780.341 ? 1839.651 ops/us > > > Performance for stable lists (thread safe) in both instance and static contexts whereby we access a single value compared to `ArrayList` instances (which are not thread-safe) (all threads): > > > Benchmark Mode Cnt Score Error Units > StableListElementBenchmark.instanceArrayList thrpt 10 5812.992 ? 1169.730 ops/us > StableListElementBenchmark.instanceList thrpt 10 4818.643 ? 704.893 ops/us > StableListElementBenchmark.staticArrayList thrpt 10 7614.741 ? 564.777 ops/us > StableListElementBe... Per Minborg has updated the pull request incrementally with two additional commits since the last revision: - Add benchmarks for memoized IntFunction and Function - Add benchmark for memoized supplier ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18794/files - new: https://git.openjdk.org/jdk/pull/18794/files/dd0ceaf0..7beab36a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=18-19 Stats: 356 lines in 3 files changed: 356 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18794.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18794/head:pull/18794 PR: https://git.openjdk.org/jdk/pull/18794 From pminborg at openjdk.org Fri May 17 09:37:09 2024 From: pminborg at openjdk.org (Per Minborg) Date: Fri, 17 May 2024 09:37:09 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v20] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Fri, 17 May 2024 09:31:33 GMT, Per Minborg wrote: >> # Stable Values & Collections (Internal) >> >> ## Summary >> This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. >> >> ## Goals >> * Provide an easy and intuitive API to describe value holders that can change at most once. >> * Decouple declaration from initialization without significant footprint or performance penalties. >> * Reduce the amount of static initializer and/or field initialization code. >> * Uphold integrity and consistency, even in a multi-threaded environment. >> >> For more details, see the draft JEP: https://openjdk.org/jeps/8312611 >> >> ## Performance >> Performance compared to instance variables using two `AtomicReference` and two protected by double-checked locking under concurrent access by all threads: >> >> >> Benchmark Mode Cnt Score Error Units >> StableBenchmark.atomic thrpt 10 259.478 ? 36.809 ops/us >> StableBenchmark.dcl thrpt 10 225.710 ? 26.638 ops/us >> StableBenchmark.stable thrpt 10 4382.478 ? 1151.472 ops/us <- StableValue significantly faster >> >> >> Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (all threads): >> >> >> Benchmark Mode Cnt Score Error Units >> StableStaticBenchmark.atomic thrpt 10 6487.835 ? 385.639 ops/us >> StableStaticBenchmark.dcl thrpt 10 6605.239 ? 210.610 ops/us >> StableStaticBenchmark.stable thrpt 10 14338.239 ? 1426.874 ops/us >> StableStaticBenchmark.staticCHI thrpt 10 13780.341 ? 1839.651 ops/us >> >> >> Performance for stable lists (thread safe) in both instance and static contexts whereby we access a single value compared to `ArrayList` instances (which are not thread-safe) (all threads): >> >> >> Benchmark Mode Cnt Score Error Units >> StableListElementBenchmark.instanceArrayList thrpt 10 5812.992 ? 1169.730 ops/us >> StableListElementBenchmark.instanceList thrpt 10 4818.643 ? 704.893 ops/us >> StableListElementBenchmark... > > Per Minborg has updated the pull request incrementally with two additional commits since the last revision: > > - Add benchmarks for memoized IntFunction and Function > - Add benchmark for memoized supplier Here are some results of a recently added benchmark that uses a memorized function (with 0 and 1 as input values): ![image](https://github.com/openjdk/jdk/assets/7457876/f2fd5b5a-ac89-483b-acb5-bc5de215417a) See [test/micro/org/openjdk/bench/java/lang/stable/MemoizedFunctionBenchmark.java for details](https://github.com/minborg/jdk/blob/stable-value/test/micro/org/openjdk/bench/java/lang/stable/MemoizedFunctionBenchmark.java) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18794#issuecomment-2117143526 From vlivanov at openjdk.org Fri May 17 12:30:27 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 17 May 2024 12:30:27 GMT Subject: RFR: 8331885: C2: meet between unloaded and speculative types is not symmetric [v2] In-Reply-To: References: Message-ID: > `TypeInstPtr::xmeet_unloaded` computes the MEET of two InstPtrs when at least one is unloaded, but doesn't preserve speculative part if one is present. It causes the corresponding assert to fail. > > Proposed fix unconditionally keeps speculative part. > > Testing: hs-tier1 - hs-tier4 Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: AlwaysIncrementalInline is a debugflag ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19249/files - new: https://git.openjdk.org/jdk/pull/19249/files/fa5479f2..0b873474 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19249&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19249&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19249.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19249/head:pull/19249 PR: https://git.openjdk.org/jdk/pull/19249 From gcao at openjdk.org Fri May 17 13:37:06 2024 From: gcao at openjdk.org (Gui Cao) Date: Fri, 17 May 2024 13:37:06 GMT Subject: RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v7] In-Reply-To: References: Message-ID: On Thu, 16 May 2024 13:03:24 GMT, ArsenyBochkarev wrote: >> Hello everyone! Please review this ~non-vectorized~ implementation of `_updateBytesAdler32` intrinsic. Reference implementation for AArch64 can be found [here](https://github.com/openjdk/jdk9/blob/master/hotspot/src/cpu/aarch64/vm/stubGenerator_aarch64.cpp#L3281). >> >> ### Correctness checks >> >> Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok. All tier1 also passed. >> >> ### Performance results on T-Head board >> >> Enabled intrinsic: >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- | >> | Adler32.TestAdler32.testAdler32Update | 64 | thrpt | 25 | 5522.693 | 23.387 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 128 | thrpt | 25 | 3430.761 | 9.210 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 256 | thrpt | 25 | 1962.888 | 5.323 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 512 | thrpt | 25 | 1050.938 | 0.144 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 1024 | thrpt | 25 | 549.227 | 0.375 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 2048 | thrpt | 25 | 280.829 | 0.170 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 5012 | thrpt | 25 | 116.333 | 0.057 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 8192 | thrpt | 25 | 71.392 | 0.060 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 16384 | thrpt | 25 | 35.784 | 0.019 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 32768 | thrpt | 25 | 17.924 | 0.010 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 65536 | thrpt | 25 | 8.940 | 0.003 | ops/ms | >> >> Disabled intrinsic: >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- | >> |Adler32.TestAdler32.testAdler32Update|64|thrpt|25|655.633|5.845|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|128|thrpt|25|587.418|10.062|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|256|thrpt|25|546.675|11.598|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|512|thrpt|25|432.328|11.517|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|1024|thrpt|25|311.771|4.238|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|2048|thrpt|25|202.648|2.486|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|5012|thrpt|... > > ArsenyBochkarev has updated the pull request incrementally with eight additional commits since the last revision: > > - Prettify L_nmax loop > - Add comments in functions > - Add explanation comment for L_nmax_loop > - Fix L_nmax_loop for big lengths > - Fix L_by16 loop step > - Prettify intrinsic > - Use LMUL=4 for most of the calculations > - Use LMUL to load multiple data in one step Hi, I ran the jmh test on the Banana Pi BPI-F3 board (has RVV1.0): Apply this pr and diable UseRVV Benchmark (count) Mode Cnt Score Error Units TestAdler32.testAdler32Update 64 thrpt 25 1845.347 ? 17.961 ops/ms TestAdler32.testAdler32Update 128 thrpt 25 1622.564 ? 18.082 ops/ms TestAdler32.testAdler32Update 256 thrpt 25 1337.308 ? 12.022 ops/ms TestAdler32.testAdler32Update 512 thrpt 25 971.847 ? 12.653 ops/ms TestAdler32.testAdler32Update 1024 thrpt 25 637.476 ? 1.802 ops/ms TestAdler32.testAdler32Update 2048 thrpt 25 377.564 ? 2.189 ops/ms TestAdler32.testAdler32Update 5012 thrpt 25 172.410 ? 0.295 ops/ms TestAdler32.testAdler32Update 8192 thrpt 25 109.077 ? 0.213 ops/ms TestAdler32.testAdler32Update 16384 thrpt 25 55.915 ? 0.062 ops/ms TestAdler32.testAdler32Update 32768 thrpt 25 26.653 ? 0.131 ops/ms TestAdler32.testAdler32Update 65536 thrpt 25 13.421 ? 0.015 ops/ms Finished running test 'micro:java.util.TestAdler32' Apply this pr and enable UseRVV Benchmark (count) Mode Cnt Score Error Units TestAdler32.testAdler32Update 64 thrpt 25 7822.238 ? 175.797 ops/ms TestAdler32.testAdler32Update 128 thrpt 25 5054.415 ? 0.133 ops/ms TestAdler32.testAdler32Update 256 thrpt 25 2859.404 ? 83.301 ops/ms TestAdler32.testAdler32Update 512 thrpt 25 1546.183 ? 47.910 ops/ms TestAdler32.testAdler32Update 1024 thrpt 25 808.569 ? 25.122 ops/ms TestAdler32.testAdler32Update 2048 thrpt 25 413.848 ? 12.909 ops/ms TestAdler32.testAdler32Update 5012 thrpt 25 168.005 ? 5.176 ops/ms TestAdler32.testAdler32Update 8192 thrpt 25 159.197 ? 3.353 ops/ms TestAdler32.testAdler32Update 16384 thrpt 25 78.056 ? 1.514 ops/ms TestAdler32.testAdler32Update 32768 thrpt 25 45.334 ? 0.756 ops/ms TestAdler32.testAdler32Update 65536 thrpt 25 24.339 ? 0.342 ops/ms Finished running test 'micro:java.util.TestAdler32' ------------- PR Comment: https://git.openjdk.org/jdk/pull/18382#issuecomment-2117621767 From gcao at openjdk.org Fri May 17 13:47:04 2024 From: gcao at openjdk.org (Gui Cao) Date: Fri, 17 May 2024 13:47:04 GMT Subject: RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v7] In-Reply-To: References: Message-ID: On Thu, 16 May 2024 13:03:24 GMT, ArsenyBochkarev wrote: >> Hello everyone! Please review this ~non-vectorized~ implementation of `_updateBytesAdler32` intrinsic. Reference implementation for AArch64 can be found [here](https://github.com/openjdk/jdk9/blob/master/hotspot/src/cpu/aarch64/vm/stubGenerator_aarch64.cpp#L3281). >> >> ### Correctness checks >> >> Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok. All tier1 also passed. >> >> ### Performance results on T-Head board >> >> Enabled intrinsic: >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- | >> | Adler32.TestAdler32.testAdler32Update | 64 | thrpt | 25 | 5522.693 | 23.387 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 128 | thrpt | 25 | 3430.761 | 9.210 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 256 | thrpt | 25 | 1962.888 | 5.323 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 512 | thrpt | 25 | 1050.938 | 0.144 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 1024 | thrpt | 25 | 549.227 | 0.375 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 2048 | thrpt | 25 | 280.829 | 0.170 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 5012 | thrpt | 25 | 116.333 | 0.057 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 8192 | thrpt | 25 | 71.392 | 0.060 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 16384 | thrpt | 25 | 35.784 | 0.019 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 32768 | thrpt | 25 | 17.924 | 0.010 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 65536 | thrpt | 25 | 8.940 | 0.003 | ops/ms | >> >> Disabled intrinsic: >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- | >> |Adler32.TestAdler32.testAdler32Update|64|thrpt|25|655.633|5.845|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|128|thrpt|25|587.418|10.062|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|256|thrpt|25|546.675|11.598|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|512|thrpt|25|432.328|11.517|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|1024|thrpt|25|311.771|4.238|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|2048|thrpt|25|202.648|2.486|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|5012|thrpt|... > > ArsenyBochkarev has updated the pull request incrementally with eight additional commits since the last revision: > > - Prettify L_nmax loop > - Add comments in functions > - Add explanation comment for L_nmax_loop > - Fix L_nmax_loop for big lengths > - Fix L_by16 loop step > - Prettify intrinsic > - Use LMUL=4 for most of the calculations > - Use LMUL to load multiple data in one step I also ran the correctness test on the Banana Pi BPI-F3 board (has RVV1.0): Before this patch and disable UseRVV: Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok Before this patch and enable UseRVV: Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok Apply this patch and disable UseRVV: Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok Apply this patch and enable UseRVV: Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is Failed The TestAdler32.jtr on Failed is as follows: [TestAdler32.jtr.log](https://github.com/openjdk/jdk/files/15350178/TestAdler32.jtr.log) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18382#issuecomment-2117643156 From amitkumar at openjdk.org Fri May 17 13:51:04 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 17 May 2024 13:51:04 GMT Subject: RFR: 8331934: [s390x] Add support for primitive array C1 clone intrinsic In-Reply-To: References: Message-ID: On Tue, 14 May 2024 08:32:16 GMT, Amit Kumar wrote: > @RealLucy @TheRealMDoerr Would you please review this one. :-) pinging you again, if you got bandwidth then please review it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19220#issuecomment-2117651723 From gcao at openjdk.org Fri May 17 13:51:09 2024 From: gcao at openjdk.org (Gui Cao) Date: Fri, 17 May 2024 13:51:09 GMT Subject: Integrated: 8331281: RISC-V: C2: Support vector-scalar and vector-immediate bitwise logic instructions In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 12:17:58 GMT, Gui Cao wrote: > Hi, We want to support vector-scalar and vector-immediate bitwise logic instructions, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot. > We can use the Int256VectorTests.java[2] to print the compilation log, verify and observe the generation of nodes. > > For example, we can use the following command to print the compilation log of a jtreg test case: > > > /home/zifeihan/jdk-tools/jtreg/bin/jtreg \ > -v:default \ > -concurrency:16 -timeout:50 \ > -javaoption:-XX:+UnlockExperimentalVMOptions \ > -javaoption:-XX:+UseRVV \ > -javaoption:-XX:+PrintOptoAssembly \ > -javaoption:-XX:LogFile=/home/zifeihan/jdk/Int256VectorTests_PrintOptoAssembly.log \ > -jdk:/home/zifeihan/jdk/build/linux-riscv64-server-fastdebug/jdk \ > /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/Int256VectorTests.java > > > > we can observe the specified compilation log `Int256VectorTests_PrintOptoAssembly.log`, which contains the vector-scalar and vector-immediate bitwise logic node for the PR implementation. > > vand_immI Node > > > 0b4 vloadcon V3 # generate iota indices > 0bc vmla V2, V2, V3, V1 > 0c4 vand_immI V2, V2, #7 > 0cc addi R7, R30, #16 # ptr, #@addP_reg_imm > 0d0 storeV [R7], V2 # vector (rvv) > > > vor_regI Node > > > 180 vor_regI V1, V1, R30 > 188 add R31, R14, R31 # ptr, #@addP_reg_reg > 18a addi R31, R31, #16 # ptr, #@addP_reg_imm > 18c storeV [R31], V1 # vector (rvv) > 194 addiw R11, R11, #8 #@addI_reg_imm > 196 blt R11, R13, B17 #@cmpI_loop P=0.500000 C=30564.000000 > > > vxor_regI Node > > 198 vxor_regI V1, V1, R30 > 1a0 add R14, R16, R14 # ptr, #@addP_reg_reg > 1a2 addi R14, R14, #16 # ptr, #@addP_reg_imm > 1a4 storeV [R14], V1 # vector (rvv) > 1ac addiw R11, R11, #8 #@addI_reg_imm > 1ae blt R11, R13, B21 #@cmpI_loop P=0.500000 C=30564.000000 > > > vand_regI_masked Node > > 234 B31: # out( B40 B32 ) <- in( B30 ) Freq: 78.5481 > 234 loadV V2, [R15] # vector (rvv) > 23c vand_regI_masked V2, V2, R11 > 244 storeV [R9], V2 # vector (rvv) > 24c mv R10, #8 # int, #@loadConI > 24e ble R7, R10, B40 #@cmpI_branch P=0.000001 C=-1.000000 > > > vor_regI_masked Node > > 1ee B32: # out( B38 B33 ) <- in( B31 ) Freq: 75.8475 > 1ee loadV V1, [R11] # vector (rvv) > 1f6 vor_regI_masked V1, V1, R31 > 1fe addi R11, R13, #32 # ptr, #@addP_reg_imm > 202 bgeu R29, R10, B38 #@cmpU_branch P=0.000001 C=-1.000000 > > vxor_regI_masked Node > > 1ee B32: # out( B38 B33 ) <- in( B31 ) Freq: 75.8475 > 1ee loadV V1, [R11]... This pull request has now been integrated. Changeset: e6111517 Author: Gui Cao Committer: Ludovic Henry URL: https://git.openjdk.org/jdk/commit/e611151796d71c40a9395cb6fbe734f36d4c1b55 Stats: 471 lines in 2 files changed: 469 ins; 0 del; 2 mod 8331281: RISC-V: C2: Support vector-scalar and vector-immediate bitwise logic instructions Reviewed-by: fjiang, fyang ------------- PR: https://git.openjdk.org/jdk/pull/18999 From mbaesken at openjdk.org Fri May 17 13:54:08 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 17 May 2024 13:54:08 GMT Subject: RFR: 8332462: ubsan: c1_ValueStack.hpp:229:49: runtime error: load of value 171, which is not a valid value for type 'bool' Message-ID: This coding, with ubsan enabled bool force_reexecute() const { return _force_reexecute; } gives us on Linux x86_64 fastdebug the following warning : /jdk/src/hotspot/share/c1/c1_ValueStack.hpp:229:49: runtime error: load of value 171, which is not a valid value for type 'bool' #0 0x14b3999f2921 in ValueStack::force_reexecute() const /jdk/src/hotspot/share/c1/c1_ValueStack.hpp:229 #1 0x14b3999f2921 in LIRGenerator::do_ArrayCopy(Intrinsic*) /jdk/src/hotspot/cpu/x86/c1_LIRGenerator_x86.cpp:1008 #2 0x14b39aa1c077 in LIRGenerator::do_root(Instruction*) /jdk/src/hotspot/share/c1/c1_LIRGenerator.cpp:379 #3 0x14b39aa2df94 in non-virtual thunk to LIRGenerator::block_do(BlockBegin*) (/net/usr.work/d040975/open_jdk/jdk_6/build_clx209_fastdebug/jdk/lib/server/libjvm.so+0x5ad1f94) #4 0x14b39a971ff6 in BlockList::iterate_forward(BlockClosure*) /jdk/src/hotspot/share/c1/c1_Instruction.cpp:891 #5 0x14b39a878114 in Compilation::emit_lir() /jdk/src/hotspot/share/c1/c1_Compilation.cpp:264 #6 0x14b39a882076 in Compilation::compile_java_method() /jdk/src/hotspot/share/c1/c1_Compilation.cpp:407 #7 0x14b39a884c48 in Compilation::compile_method() /jdk/src/hotspot/share/c1/c1_Compilation.cpp:479 #8 0x14b39a88681a in Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, BufferBlob*, bool, DirectiveSet*) /jdk/src/hotspot/share/c1/c1_Compilation.cpp:609 #9 0x14b39a88bd63 in Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) /jdk/src/hotspot/share/c1/c1_Compiler.cpp:260 #10 0x14b39b153241 in CompileBroker::invoke_compiler_on_method(CompileTask*) /jdk/src/hotspot/share/compiler/compileBroker.cpp:2303 #11 0x14b39b154d3e in CompileBroker::compiler_thread_loop() /jdk/src/hotspot/share/compiler/compileBroker.cpp:1961 #12 0x14b39bdb17bc in JavaThread::thread_main_inner() /jdk/src/hotspot/share/runtime/javaThread.cpp:759 #13 0x14b39d8a828f in Thread::call_run() /jdk/src/hotspot/share/runtime/thread.cpp:225 ... (rest of output omitted) Seems we miss initializations of the variable _force_reexecute , and this can lead to arbitrary values at the address in memory where _force_reexecute is stored. ------------- Commit messages: - JDK-8332462 Changes: https://git.openjdk.org/jdk/pull/19284/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19284&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332462 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19284.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19284/head:pull/19284 PR: https://git.openjdk.org/jdk/pull/19284 From chagedorn at openjdk.org Fri May 17 14:07:07 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 17 May 2024 14:07:07 GMT Subject: RFR: 8332462: ubsan: c1_ValueStack.hpp:229:49: runtime error: load of value 171, which is not a valid value for type 'bool' In-Reply-To: References: Message-ID: On Fri, 17 May 2024 13:48:57 GMT, Matthias Baesken wrote: > This coding, with ubsan enabled > bool force_reexecute() const { return _force_reexecute; } > > gives us on Linux x86_64 fastdebug the following warning : > > /jdk/src/hotspot/share/c1/c1_ValueStack.hpp:229:49: runtime error: load of value 171, which is not a valid value for type 'bool' > #0 0x14b3999f2921 in ValueStack::force_reexecute() const /jdk/src/hotspot/share/c1/c1_ValueStack.hpp:229 > #1 0x14b3999f2921 in LIRGenerator::do_ArrayCopy(Intrinsic*) /jdk/src/hotspot/cpu/x86/c1_LIRGenerator_x86.cpp:1008 > #2 0x14b39aa1c077 in LIRGenerator::do_root(Instruction*) /jdk/src/hotspot/share/c1/c1_LIRGenerator.cpp:379 > #3 0x14b39aa2df94 in non-virtual thunk to LIRGenerator::block_do(BlockBegin*) (/net/usr.work/d040975/open_jdk/jdk_6/build_clx209_fastdebug/jdk/lib/server/libjvm.so+0x5ad1f94) > #4 0x14b39a971ff6 in BlockList::iterate_forward(BlockClosure*) /jdk/src/hotspot/share/c1/c1_Instruction.cpp:891 > #5 0x14b39a878114 in Compilation::emit_lir() /jdk/src/hotspot/share/c1/c1_Compilation.cpp:264 > #6 0x14b39a882076 in Compilation::compile_java_method() /jdk/src/hotspot/share/c1/c1_Compilation.cpp:407 > #7 0x14b39a884c48 in Compilation::compile_method() /jdk/src/hotspot/share/c1/c1_Compilation.cpp:479 > #8 0x14b39a88681a in Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, BufferBlob*, bool, DirectiveSet*) /jdk/src/hotspot/share/c1/c1_Compilation.cpp:609 > #9 0x14b39a88bd63 in Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) /jdk/src/hotspot/share/c1/c1_Compiler.cpp:260 > #10 0x14b39b153241 in CompileBroker::invoke_compiler_on_method(CompileTask*) /jdk/src/hotspot/share/compiler/compileBroker.cpp:2303 > #11 0x14b39b154d3e in CompileBroker::compiler_thread_loop() /jdk/src/hotspot/share/compiler/compileBroker.cpp:1961 > #12 0x14b39bdb17bc in JavaThread::thread_main_inner() /jdk/src/hotspot/share/runtime/javaThread.cpp:759 > #13 0x14b39d8a828f in Thread::call_run() /jdk/src/hotspot/share/runtime/thread.cpp:225 > ... (rest of output omitted) > > Seems we miss initializations of the variable _force_reexecute , and this can lead to arbitrary values at the address in memory where _force_reexecute is stored. Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19284#pullrequestreview-2063531599 From mdoerr at openjdk.org Fri May 17 14:16:01 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 17 May 2024 14:16:01 GMT Subject: RFR: 8332462: ubsan: c1_ValueStack.hpp:229:49: runtime error: load of value 171, which is not a valid value for type 'bool' In-Reply-To: References: Message-ID: On Fri, 17 May 2024 13:48:57 GMT, Matthias Baesken wrote: > This coding, with ubsan enabled > bool force_reexecute() const { return _force_reexecute; } > > gives us on Linux x86_64 fastdebug the following warning : > > /jdk/src/hotspot/share/c1/c1_ValueStack.hpp:229:49: runtime error: load of value 171, which is not a valid value for type 'bool' > #0 0x14b3999f2921 in ValueStack::force_reexecute() const /jdk/src/hotspot/share/c1/c1_ValueStack.hpp:229 > #1 0x14b3999f2921 in LIRGenerator::do_ArrayCopy(Intrinsic*) /jdk/src/hotspot/cpu/x86/c1_LIRGenerator_x86.cpp:1008 > #2 0x14b39aa1c077 in LIRGenerator::do_root(Instruction*) /jdk/src/hotspot/share/c1/c1_LIRGenerator.cpp:379 > #3 0x14b39aa2df94 in non-virtual thunk to LIRGenerator::block_do(BlockBegin*) (/net/usr.work/d040975/open_jdk/jdk_6/build_clx209_fastdebug/jdk/lib/server/libjvm.so+0x5ad1f94) > #4 0x14b39a971ff6 in BlockList::iterate_forward(BlockClosure*) /jdk/src/hotspot/share/c1/c1_Instruction.cpp:891 > #5 0x14b39a878114 in Compilation::emit_lir() /jdk/src/hotspot/share/c1/c1_Compilation.cpp:264 > #6 0x14b39a882076 in Compilation::compile_java_method() /jdk/src/hotspot/share/c1/c1_Compilation.cpp:407 > #7 0x14b39a884c48 in Compilation::compile_method() /jdk/src/hotspot/share/c1/c1_Compilation.cpp:479 > #8 0x14b39a88681a in Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, BufferBlob*, bool, DirectiveSet*) /jdk/src/hotspot/share/c1/c1_Compilation.cpp:609 > #9 0x14b39a88bd63 in Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) /jdk/src/hotspot/share/c1/c1_Compiler.cpp:260 > #10 0x14b39b153241 in CompileBroker::invoke_compiler_on_method(CompileTask*) /jdk/src/hotspot/share/compiler/compileBroker.cpp:2303 > #11 0x14b39b154d3e in CompileBroker::compiler_thread_loop() /jdk/src/hotspot/share/compiler/compileBroker.cpp:1961 > #12 0x14b39bdb17bc in JavaThread::thread_main_inner() /jdk/src/hotspot/share/runtime/javaThread.cpp:759 > #13 0x14b39d8a828f in Thread::call_run() /jdk/src/hotspot/share/runtime/thread.cpp:225 > ... (rest of output omitted) > > Seems we miss initializations of the variable _force_reexecute , and this can lead to arbitrary values at the address in memory where _force_reexecute is stored. Marked as reviewed by mdoerr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19284#pullrequestreview-2063554280 From chagedorn at openjdk.org Fri May 17 14:20:02 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 17 May 2024 14:20:02 GMT Subject: RFR: 8332369: C2: assert(false) failed: graph should be schedulable after JDK-8324517 In-Reply-To: References: Message-ID: On Fri, 17 May 2024 07:50:08 GMT, Roland Westrelin wrote: > The issue occurs when a `Mod` node is processed during > final_graph_reshaping: if a `Div` node is found with the same inputs, > the `Mod` is replaced either by a `DivMod` node or a subgraph that has > the `Div` node as input. Finding the `Div` node is done > `find_similar()` which ignores the precedence edges. What happens is > that the `Div` node returned by `find_similar()` could have a > precedence edge that pins it at a control that doesn't dominate the > control of some of the uses of the `Mod` node. > > The fix I propose is to simply not perfom the transformation if one of > the nodes has precedence edges (which should be a rare corner case). That looks reasonable. I've submitted some testing. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19277#pullrequestreview-2063565761 From chagedorn at openjdk.org Fri May 17 14:28:02 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 17 May 2024 14:28:02 GMT Subject: RFR: 8332369: C2: assert(false) failed: graph should be schedulable after JDK-8324517 In-Reply-To: References: Message-ID: <6kAsv9goCi0tS2wZvFfikgdSnHh1cMLjswtav5-zNJE=.3705d0f9-39a7-4977-90ae-28619431d841@github.com> On Fri, 17 May 2024 07:50:08 GMT, Roland Westrelin wrote: > The issue occurs when a `Mod` node is processed during > final_graph_reshaping: if a `Div` node is found with the same inputs, > the `Mod` is replaced either by a `DivMod` node or a subgraph that has > the `Div` node as input. Finding the `Div` node is done > `find_similar()` which ignores the precedence edges. What happens is > that the `Div` node returned by `find_similar()` could have a > precedence edge that pins it at a control that doesn't dominate the > control of some of the uses of the `Mod` node. > > The fix I propose is to simply not perfom the transformation if one of > the nodes has precedence edges (which should be a rare corner case). src/hotspot/share/opto/compile.cpp line 3626: > 3624: // Check if a%b and a/b both exist > 3625: Node* d = n->find_similar(Op_DivI); > 3626: if (d && !d->has_prec_edges()) { Could be replaced with Suggestion: if (d != nullptr && !d->has_prec_edges()) { Same at other places below. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19277#discussion_r1605101465 From mdoerr at openjdk.org Fri May 17 15:14:04 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 17 May 2024 15:14:04 GMT Subject: RFR: 8331935: Add support for primitive array C1 clone intrinsic in PPC In-Reply-To: References: Message-ID: On Wed, 15 May 2024 13:50:27 GMT, Varada M wrote: > https://bugs.openjdk.org/browse/JDK-8302850 port for PPC64 > > JMH Benchmark Results > > > Before : > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 114.107 ? 1.337 ns/op > ArrayClone.byteArraycopy 10 avgt 15 130.492 ? 0.991 ns/op > ArrayClone.byteArraycopy 100 avgt 15 139.103 ? 1.913 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 321.688 ? 6.033 ns/op > ArrayClone.byteClone 0 avgt 15 227.602 ? 3.393 ns/op > ArrayClone.byteClone 10 avgt 15 237.624 ? 2.996 ns/op > ArrayClone.byteClone 100 avgt 15 239.219 ? 2.835 ns/op > > ArrayClone.byteClone 1000 avgt 15 355.571 ? 2.946 ns/op > ArrayClone.intArraycopy 0 avgt 15 113.275 ? 1.099 ns/op > ArrayClone.intArraycopy 10 avgt 15 129.763 ? 1.458 ns/op > ArrayClone.intArraycopy 100 avgt 15 213.327 ? 2.524 ns/op > ArrayClone.intArraycopy 1000 avgt 15 449.650 ? 7.338 ns/op > ArrayClone.intClone 0 avgt 15 225.682 ? 3.048 ns/op > ArrayClone.intClone 10 avgt 15 234.532 ? 2.817 ns/op > ArrayClone.intClone 100 avgt 15 295.934 ? 4.925 ns/op > ArrayClone.intClone 1000 avgt 15 573.368 ? 5.739 ns/op > Finished running test 'micro:java.lang.ArrayClone' > Test report is stored in build/aix-ppc64-server-release/test-results/micro_java_lang_ArrayClone > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > micro:java.lang.ArrayClone 1 1 0 0 > ============================== > TEST SUCCESS > > Finished building target 'test' in configuration 'aix-ppc64-server-release' > > > > > After: > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 113.894 ? 0.993 ns/op > ArrayClone.byteArraycopy 10 avgt 15 131.455 ? 0.956 ns/op > ArrayClone.byteArraycopy 100 avgt 15 139.145 ? 3.002 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 315.957 ? 14.591 ns/op > ArrayClone.byteClone 0 avgt 15 43.753 ? 3.669 ns/op > ArrayClone.byteClone 10 avgt 15 52.329 ? 1.041 ns/op > ArrayClone.byteClone 100 avgt 15 127.711 ? 3.938 ns/op > > ArrayClone.byteClone 1000 avgt 15 225.937 ? 1.987 ns/op > ArrayClone.intArraycopy 0 avgt 15 113.788 ? 0.770 ns/op > ArrayClone.intArraycopy 10 avgt 1... I also have a minor cleanup proposal for `LIR_Assembler::emit_arraycopy`: diff --git a/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp b/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp index dba662a2212..2424d820177 100644 --- a/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp +++ b/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp @@ -1827,18 +1827,17 @@ void LIR_Assembler::emit_arraycopy(LIR_OpArrayCopy* op) { int flags = op->flags(); ciArrayKlass* default_type = op->expected_type(); - BasicType basic_type = default_type != nullptr ? default_type->element_type()->basic_type() : T_ILLEGAL; + BasicType basic_type = (default_type != nullptr) ? default_type->element_type()->basic_type() : T_ILLEGAL; if (basic_type == T_ARRAY) basic_type = T_OBJECT; // Set up the arraycopy stub information. ArrayCopyStub* stub = op->stub(); - const int frame_resize = frame::native_abi_reg_args_size - sizeof(frame::java_abi); // C calls need larger frame. // Always do stub if no type information is available. It's ok if // the known type isn't loaded since the code sanity checks // in debug mode and the type isn't required when we know the exact type // also check that the type is an array type. - if (op->expected_type() == nullptr) { + if (default_type == nullptr) { assert(src->is_nonvolatile() && src_pos->is_nonvolatile() && dst->is_nonvolatile() && dst_pos->is_nonvolatile() && length->is_nonvolatile(), "must preserve"); address copyfunc_addr = StubRoutines::generic_arraycopy(); @@ -1873,7 +1872,7 @@ void LIR_Assembler::emit_arraycopy(LIR_OpArrayCopy* op) { return; } - assert(default_type != nullptr && default_type->is_array_klass(), "must be true at this point"); + assert(default_type != nullptr && default_type->is_array_klass() && default_type->is_loaded(), "must be true at this point"); Label cont, slow, copyfunc; bool simple_check_flag_set = flags & (LIR_OpArrayCopy::src_null_check | Would be nice to have. src/hotspot/cpu/ppc/c1_LIRGenerator_ppc.cpp line 903: > 901: info = state_for(x, x->state()); > 902: } > 903: This code seems to be integrated at the wrong place. Other platforms have it in `LIRGenerator::do_NewTypeArray`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19250#issuecomment-2117816097 PR Review Comment: https://git.openjdk.org/jdk/pull/19250#discussion_r1605160938 From cslucas at openjdk.org Fri May 17 16:36:06 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 17 May 2024 16:36:06 GMT Subject: RFR: JDK-8330565 : C2: Multiple crashes with CTW after JDK-8316991 [v2] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 17:30:54 GMT, Tobias Hartmann wrote: >> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactor split_castpp_load_through_phi > > Looks good to me too. I submitted testing. @TobiHartmann, did the tests pass? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19147#issuecomment-2117966776 From jbhateja at openjdk.org Fri May 17 16:54:09 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 17 May 2024 16:54:09 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers In-Reply-To: References: <2ix8fZdbyXTav2FBERlzl7U6JkI3i9hPFGSNKbrDlpo=.a219b3de-7035-44d0-9bdc-3ea599800eb3@github.com> Message-ID: On Sat, 11 May 2024 21:21:47 GMT, Jatin Bhateja wrote: > > A) With recent change register only flavors of cvtsi2ss / cvtsi2sd / cvttsd2si/ cvttss2si which are all legacy map 1 instruction and are encoded using REX prefixes at UseAVX=0 will now be promoted to EEVEX which is a fixed 4 byte prefix, we should use REX2 instead. [cvtsi2ss_MAP1_with_EEVEX.txt](https://github.com/openjdk/jdk/files/15284294/cvtsi2ss_MAP1_with_EEVEX.txt) > SDE ERROR: Illegal instruction at address = 7f3aafa329a0: f3 d5 10 **0f** 2a c1 I debugged the cause of above assertion failure with UseAVX=0 and found two issues:- 1) Secondary map ID (0x0f) is being emitted after REX2 prefix, this should not happen since REX.M0 encodes this information. 2) REX.M0 is not being set, because we are not passing correct value for map1 argument https://github.com/openjdk/jdk/blob/47885cbe8eb215323e9ca8f4a36d422d17521e57/src/hotspot/cpu/x86/assembler_x86.cpp#L11734 If we bring out the 0x0f to the top level assembler routines and remove following newly added codelets we can handle APX even at AVX level 0 and lift AVX512 constraint you introduced since SPECjbb2015 benchmark still [explicitly pass UseAVX=0 during measurments.](https://spec.org/jbb2015/results/res2024q1/jbb2015-20240110-01212.html#:~:text=Xms29g%20%2DXmx29g%20%2DXmn27g-,%2DXX%3AUseAVX%3D0,-%2DXX%3AParallelGCThreads%3D32) https://github.com/openjdk/jdk/blob/47885cbe8eb215323e9ca8f4a36d422d17521e57/src/hotspot/cpu/x86/assembler_x86.cpp#L13158 https://github.com/openjdk/jdk/blob/47885cbe8eb215323e9ca8f4a36d422d17521e57/src/hotspot/cpu/x86/assembler_x86.cpp#L13187 https://github.com/openjdk/jdk/blob/47885cbe8eb215323e9ca8f4a36d422d17521e57/src/hotspot/cpu/x86/assembler_x86.cpp#L13347 I am ok to address these limitation in subsequent patches. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2117999627 From jbhateja at openjdk.org Fri May 17 16:54:10 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 17 May 2024 16:54:10 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v22] In-Reply-To: References: Message-ID: On Thu, 16 May 2024 19:03:18 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > simplify test in new asserts to just assert UseAPX src/hotspot/cpu/x86/vm_version_x86.cpp line 1005: > 1003: } > 1004: > 1005: if (UseAPX && (UseAVX < 3)) { A comment here will be helpful stating the need to disable APX functionality for non AVX512 targets, please note UseAVX is set to level 3 based on existence of CPUID (EAX=07, EBX[16] = AVX512F) bit, and future AVX10 targets may support APX. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1604657235 From aph at openjdk.org Fri May 17 17:43:21 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 17 May 2024 17:43:21 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v18] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 14:54:17 GMT, Roland Westrelin wrote: >> This change implements C2 optimizations for calls to >> ScopedValue.get(). Indeed, in: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> `v2` can be replaced by `v1` and the second call to `get()` can be >> optimized out. That's true whatever is between the 2 calls unless a >> new mapping for `scopedValue` is created in between (when that happens >> no optimizations is performed for the method being compiled). Hoisting >> a `get()` call out of loop for a loop invariant `scopedValue` should >> also be legal in most cases. >> >> `ScopedValue.get()` is implemented in java code as a 2 step process. A >> cache is attached to the current thread object. If the `ScopedValue` >> object is in the cache then the result from `get()` is read from >> there. Otherwise a slow call is performed that also inserts the >> mapping in the cache. The cache itself is lazily allocated. One >> `ScopedValue` can be hashed to 2 different indexes in the cache. On a >> cache probe, both indexes are checked. As a consequence, the process >> of probing the cache is a multi step process (check if the cache is >> present, check first index, check second index if first index >> failed). If the cache is populated early on, then when the method that >> calls `ScopedValue.get()` is compiled, profile reports the slow path >> as never taken and only the read from the cache is compiled. >> >> To perform the optimizations, I added 3 new node types to C2: >> >> - the pair >> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for >> the cache probe >> >> - a cfg node ScopedValueGetResultNode to help locate the result of the >> `get()` call in the IR graph. >> >> In pseudo code, once the nodes are inserted, the code of a `get()` is: >> >> >> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) >> if (hits_in_the_cache) { >> res = ScopedValueGetLoadFromCache(hits_in_the_cache); >> } else { >> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex >> } >> res = ScopedValueGetResult(res) >> >> >> In the snippet: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> Replacing `v2` by `v1` is then done by starting from the >> `ScopedValueGetResult` node for the second `get()` and looking for a >> dominating `ScopedValueGetResult` for the same `ScopedValue` >> object. When one is found, it is used as a replacement. Eliminating >> the second `get()` call is achieved by making >> `ScopedValueGetHitsInCache` always successful if there's a dominating >> `Scoped... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > whitespaces I've spoken to some senior JDK developers and their feeling is that this patch is too specific to the current scoped value implementation and too complex to go into HotSpot, especially for a feature like scoped values that is not yet out of preview. This PR is good. It does everything needed to generate better, smaller code for scoped values. It also means that scoped values have a smaller memory footprint, a real win-win. However, I think we're going to have to park this PR for now. I think we should revisit it when scoped values is out of preview. Thanks, and I'm sorry that you put so much work into this patch for it not to be committed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2118101328 From thartmann at openjdk.org Fri May 17 17:45:12 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 17 May 2024 17:45:12 GMT Subject: RFR: JDK-8330565 : C2: Multiple crashes with CTW after JDK-8316991 [v2] In-Reply-To: References: Message-ID: <9UNM2K_DcKzIWdLWJXL2uKZiWZg-rsAk3gRQ5XQkXGo=.410345ee-43e2-4f53-86f7-32ea56698f96@github.com> On Wed, 15 May 2024 13:22:34 GMT, Vladimir Kozlov wrote: >> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactor split_castpp_load_through_phi > > Looks good. This have to be tested. Yes, all tests passed. Sorry for the delay. @vnkozlov needs to press approve as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19147#issuecomment-2118104042 From duke at openjdk.org Fri May 17 17:52:31 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Fri, 17 May 2024 17:52:31 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v23] In-Reply-To: References: Message-ID: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: added comment about UseAPX and UseAVX > 2 correspondence ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/47885cbe..49b117ef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=21-22 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From duke at openjdk.org Fri May 17 17:52:32 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Fri, 17 May 2024 17:52:32 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v22] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 09:47:22 GMT, Jatin Bhateja wrote: >> Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: >> >> simplify test in new asserts to just assert UseAPX > > src/hotspot/cpu/x86/vm_version_x86.cpp line 1005: > >> 1003: } >> 1004: >> 1005: if (UseAPX && (UseAVX < 3)) { > > A comment here will be helpful stating the need to disable APX functionality for non AVX512 targets, please note UseAVX is set to level 3 based on existence of CPUID (EAX=07, EBX[16] = AVX512F) bit, and future AVX10 targets may support APX. Thanks. I added a comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1605370071 From kvn at openjdk.org Fri May 17 18:51:02 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 17 May 2024 18:51:02 GMT Subject: RFR: JDK-8330565 : C2: Multiple crashes with CTW after JDK-8316991 [v3] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 18:15:16 GMT, Cesar Soares Lucas wrote: >> The `# assert(false) failed: Bad graph detected in build_loop_late` failure was caused because a string concatenation optimization using [this method](https://github.com/openjdk/jdk/blob/819f3d6fc70ff6fe54ac5f9033c17c3dd4326aa5/src/hotspot/share/opto/graphKit.cpp#L4115) adds AddP and LoadN nodes to IR graph as NotNull _and_ because RAM was not "nullyfing" phis merging nullable pointers. I was only able to reproduce this problem using a classfile/jar compiled using an "old" version of JDK.. because newer version use InvokeDynamic to do string concatenation. >> >> Tested with JTREG tier1-4 on Linux x86_64 & ARM64. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/c2/TestReduceAllocationAndNullableLoads.java > > -Xcomp implies -Xbatch > > Co-authored-by: Tobias Hartmann Good ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19147#pullrequestreview-2064167491 From cslucas at openjdk.org Fri May 17 19:00:03 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 17 May 2024 19:00:03 GMT Subject: RFR: JDK-8330565 : C2: Multiple crashes with CTW after JDK-8316991 [v3] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 18:15:16 GMT, Cesar Soares Lucas wrote: >> The `# assert(false) failed: Bad graph detected in build_loop_late` failure was caused because a string concatenation optimization using [this method](https://github.com/openjdk/jdk/blob/819f3d6fc70ff6fe54ac5f9033c17c3dd4326aa5/src/hotspot/share/opto/graphKit.cpp#L4115) adds AddP and LoadN nodes to IR graph as NotNull _and_ because RAM was not "nullyfing" phis merging nullable pointers. I was only able to reproduce this problem using a classfile/jar compiled using an "old" version of JDK.. because newer version use InvokeDynamic to do string concatenation. >> >> Tested with JTREG tier1-4 on Linux x86_64 & ARM64. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/c2/TestReduceAllocationAndNullableLoads.java > > -Xcomp implies -Xbatch > > Co-authored-by: Tobias Hartmann Thank you all for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19147#issuecomment-2118208239 From mdoerr at openjdk.org Fri May 17 20:13:04 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 17 May 2024 20:13:04 GMT Subject: RFR: 8331934: [s390x] Add support for primitive array C1 clone intrinsic [v3] In-Reply-To: References: Message-ID: <_nyjp0G26h2YkjCRoOydWryudmho_vymj5HT4Q0Y6WI=.ff25a225-aec1-4935-bd7e-dedde72f0916@github.com> On Wed, 15 May 2024 09:25:32 GMT, Amit Kumar wrote: >> Adds JDK-8302850 Port for s390x. >> >> Testing: >> >> make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:hotspot_compiler 1166 1166 0 0 >> ============================== >> TEST SUCCESS >> >> * Tier1 Test with Fast debug build. >> >> BenchMarking: >> >> >> Without Patch: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 10.838 ? 0.461 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 28.919 ? 1.695 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 48.815 ? 0.901 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 256.357 ? 7.901 ns/op >> ArrayClone.byteClone 0 avgt 15 90.398 ? 3.119 ns/op >> ArrayClone.byteClone 10 avgt 15 103.774 ? 4.468 ns/op >> ArrayClone.byteClone 100 avgt 15 126.628 ? 6.952 ns/op >> ArrayClone.byteClone 1000 avgt 15 326.409 ? 31.635 ns/op >> ArrayClone.intArraycopy 0 avgt 15 10.450 ? 0.509 ns/op >> ArrayClone.intArraycopy 10 avgt 15 36.903 ? 0.753 ns/op >> ArrayClone.intArraycopy 100 avgt 15 85.964 ? 1.806 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 841.512 ? 40.335 ns/op >> ArrayClone.intClone 0 avgt 15 89.332 ? 3.695 ns/op >> ArrayClone.intClone 10 avgt 15 110.639 ? 2.476 ns/op >> ArrayClone.intClone 100 avgt 15 195.781 ? 8.622 ns/op >> ArrayClone.intClone 1000 avgt 15 1058.479 ? 92.468 ns/op >> Finished running test 'micro:java.lang.ArrayClone' >> >> >> with patch: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 10.526... > > Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: > > - Merge master > - s390x Port > - Update src/hotspot/share/c1/c1_GraphBuilder.cpp > > Co-authored-by: Dean Long <17332032+dean-long at users.noreply.github.com> > - Fix assert to only have a single ! > - Assert type is not interface > - Remove whitespace > - Expanded testing in TestNullArrayClone > > * Added byte[] and long[] tests. > * Verified that the cloned array has the same contents. > * Increase number of iterations reach tier 3 threshold. > - Update src/hotspot/share/c1/c1_GraphBuilder.cpp > > Co-authored-by: Boris <42576543+bulasevich at users.noreply.github.com> > - Added test summary > - Use vmIntrinsics instead of vmIntrinsicID > - ... and 16 more: https://git.openjdk.org/jdk/compare/2f10a316...865de5ba LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19220#pullrequestreview-2064315672 From sviswanathan at openjdk.org Fri May 17 22:40:15 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 17 May 2024 22:40:15 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: <8Y-nIHc8vfB1X_hp3tpqqqgpCzu6dAt6BBIP_zc4Q70=.c9a48c68-8c14-4af9-8357-ab50e62a5fd3@github.com> Message-ID: On Thu, 16 May 2024 20:22:40 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1510: >> >>> 1508: compare_big_haystack_to_needle(sizeKnown, size, NUMBER_OF_NEEDLE_BYTES_TO_COMPARE, loop_top, hsPtrRet, hsLength, >>> 1509: needleLen, isU, DO_EARLY_BAILOUT, eq_mask, temp2, r10, _masm); >>> 1510: >> >> At this point hsLength is not the remaining length from hsPtrRet, would that cause a problem? If not, all the special paths in compare_big_haystack_to_needle need not be generated on this call. > > Not sure what you mean here. I *think* you mean that hsLength is not the length of the remaining bytes in the haystack, but the actual length. There may be an issue if that is correct, right? I'll investigate. Yes, that is what I meant. Thanks for investigating. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1605594796 From sviswanathan at openjdk.org Fri May 17 22:43:05 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 17 May 2024 22:43:05 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: <8Y-nIHc8vfB1X_hp3tpqqqgpCzu6dAt6BBIP_zc4Q70=.c9a48c68-8c14-4af9-8357-ab50e62a5fd3@github.com> Message-ID: <5DbhciTOeJf2n_vsG_R2r35-vFFp3QH3mmOX9hrqC3g=.9117cc86-a514-4e9b-a5d4-7108e72170ae@github.com> On Thu, 16 May 2024 17:08:21 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 238: >> >>> 236: const Register needle = rdx; >>> 237: const Register needle_len = rcx; >>> 238: >> >> This is the calling convention on Linux. How is windows platform handled? > > The entry code switches Windows calling convention into Linux calling convention by moving/saving registers, which are properly restored on function exit. This makes register tracking easier. I don't see the place where the switch is happening before this initial piece of code. You also have windows tests failing in the GHA. Could you please double check? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1605596148 From cslucas at openjdk.org Fri May 17 23:42:09 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 17 May 2024 23:42:09 GMT Subject: Integrated: JDK-8330565 : C2: Multiple crashes with CTW after JDK-8316991 In-Reply-To: References: Message-ID: On Wed, 8 May 2024 23:44:23 GMT, Cesar Soares Lucas wrote: > The `# assert(false) failed: Bad graph detected in build_loop_late` failure was caused because a string concatenation optimization using [this method](https://github.com/openjdk/jdk/blob/819f3d6fc70ff6fe54ac5f9033c17c3dd4326aa5/src/hotspot/share/opto/graphKit.cpp#L4115) adds AddP and LoadN nodes to IR graph as NotNull _and_ because RAM was not "nullyfing" phis merging nullable pointers. I was only able to reproduce this problem using a classfile/jar compiled using an "old" version of JDK.. because newer version use InvokeDynamic to do string concatenation. > > Tested with JTREG tier1-4 on Linux x86_64 & ARM64. This pull request has now been integrated. Changeset: 8acdd2d7 Author: Cesar Soares Lucas Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/8acdd2d7c8de17515b87815d54ce556237039406 Stats: 91 lines in 2 files changed: 81 ins; 0 del; 10 mod 8330565: C2: Multiple crashes with CTW after JDK-8316991 Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/19147 From sgibbons at openjdk.org Fri May 17 23:47:45 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 17 May 2024 23:47:45 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v20] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Addressing lots of comments. Interim commit. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/fb4da92a..9a861979 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=18-19 Stats: 1639 lines in 9 files changed: 429 ins; 683 del; 527 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sgibbons at openjdk.org Fri May 17 23:56:07 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 17 May 2024 23:56:07 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: <5DbhciTOeJf2n_vsG_R2r35-vFFp3QH3mmOX9hrqC3g=.9117cc86-a514-4e9b-a5d4-7108e72170ae@github.com> References: <8Y-nIHc8vfB1X_hp3tpqqqgpCzu6dAt6BBIP_zc4Q70=.c9a48c68-8c14-4af9-8357-ab50e62a5fd3@github.com> <5DbhciTOeJf2n_vsG_R2r35-vFFp3QH3mmOX9hrqC3g=.9117cc86-a514-4e9b-a5d4-7108e72170ae@github.com> Message-ID: On Fri, 17 May 2024 22:40:50 GMT, Sandhya Viswanathan wrote: >> The entry code switches Windows calling convention into Linux calling convention by moving/saving registers, which are properly restored on function exit. This makes register tracking easier. > > I don't see the place where the switch is happening before this initial piece of code. You also have windows tests failing in the GHA. Could you please double check? Fixed to use c_rargX ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1605618391 From sgibbons at openjdk.org Fri May 17 23:56:08 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 17 May 2024 23:56:08 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: <8-W2sMyDMG71FBi7q_BLwiRoUj5Drr_J2IHiJPAtXd8=.a92b0aa8-402b-4d3e-9eb5-60e5d125920a@github.com> On Tue, 14 May 2024 18:38:38 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Rearrange; add lambdas for clarity > > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1784: > >> 1782: __ subq(tmp, haystack_len); >> 1783: } >> 1784: __ leaq(haystack, Address(rsp, tmp, Address::times_1)); > > This whole code is repeated in two places. Could be made into a function and used at both places. This is the only place now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1605617739 From sgibbons at openjdk.org Sat May 18 00:02:17 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Sat, 18 May 2024 00:02:17 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Tue, 14 May 2024 00:38:30 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Rearrange; add lambdas for clarity > > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1178: > >> 1176: __ andq(eq_mask, lastMask); >> 1177: if (needToSaveRCX) { >> 1178: __ movdq(rcx, saveRCX); > > movdq is an expensive instruction (about 3 cycle). If we have another gpr temporary available here for shiftVal, then we dont need to do save/restore rcx. No longer need to use rcx. Refactored. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1605619614 From sgibbons at openjdk.org Sat May 18 00:02:17 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Sat, 18 May 2024 00:02:17 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 19:18:02 GMT, Volodymyr Paprotski wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Rearrange; add lambdas for clarity > > test/jdk/java/lang/StringBuffer/IndexOf.java line 40: > >> 38: private static boolean failure = false; >> 39: public static void main(String[] args) throws Exception { >> 40: String testName = "IndexOf"; > > intentation Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1605619940 From amitkumar at openjdk.org Sat May 18 14:08:08 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 18 May 2024 14:08:08 GMT Subject: RFR: 8331934: [s390x] Add support for primitive array C1 clone intrinsic [v3] In-Reply-To: References: Message-ID: <2qTcJfbhEmK8YhwBTdpeIB9C1g6RPDhUDntb5S9Cp0E=.cbc73833-b93f-4ab7-8e67-630852cc73cc@github.com> On Wed, 15 May 2024 09:25:32 GMT, Amit Kumar wrote: >> Adds JDK-8302850 Port for s390x. >> >> Testing: >> >> make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:hotspot_compiler 1166 1166 0 0 >> ============================== >> TEST SUCCESS >> >> * Tier1 Test with Fast debug build. >> >> BenchMarking: >> >> >> Without Patch: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 10.838 ? 0.461 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 28.919 ? 1.695 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 48.815 ? 0.901 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 256.357 ? 7.901 ns/op >> ArrayClone.byteClone 0 avgt 15 90.398 ? 3.119 ns/op >> ArrayClone.byteClone 10 avgt 15 103.774 ? 4.468 ns/op >> ArrayClone.byteClone 100 avgt 15 126.628 ? 6.952 ns/op >> ArrayClone.byteClone 1000 avgt 15 326.409 ? 31.635 ns/op >> ArrayClone.intArraycopy 0 avgt 15 10.450 ? 0.509 ns/op >> ArrayClone.intArraycopy 10 avgt 15 36.903 ? 0.753 ns/op >> ArrayClone.intArraycopy 100 avgt 15 85.964 ? 1.806 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 841.512 ? 40.335 ns/op >> ArrayClone.intClone 0 avgt 15 89.332 ? 3.695 ns/op >> ArrayClone.intClone 10 avgt 15 110.639 ? 2.476 ns/op >> ArrayClone.intClone 100 avgt 15 195.781 ? 8.622 ns/op >> ArrayClone.intClone 1000 avgt 15 1058.479 ? 92.468 ns/op >> Finished running test 'micro:java.lang.ArrayClone' >> >> >> with patch: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 10.526... > > Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: > > - Merge master > - s390x Port > - Update src/hotspot/share/c1/c1_GraphBuilder.cpp > > Co-authored-by: Dean Long <17332032+dean-long at users.noreply.github.com> > - Fix assert to only have a single ! > - Assert type is not interface > - Remove whitespace > - Expanded testing in TestNullArrayClone > > * Added byte[] and long[] tests. > * Verified that the cloned array has the same contents. > * Increase number of iterations reach tier 3 threshold. > - Update src/hotspot/share/c1/c1_GraphBuilder.cpp > > Co-authored-by: Boris <42576543+bulasevich at users.noreply.github.com> > - Added test summary > - Use vmIntrinsics instead of vmIntrinsicID > - ... and 16 more: https://git.openjdk.org/jdk/compare/2f10a316...865de5ba @galderz if possible can you review this ? Maybe this could ease you a bit while review: Testing for `{tier1} X {fastdebug, slowdebug, release}` and `{tier1 -XX:TieredStopAtLevel=1} X {fastdebug, slowdebug, release}` was clean. Benchmarking also shows that we are good on this. You can check the result [here](https://github.com/openjdk/jdk/pull/19220#issuecomment-2115237059) :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19220#issuecomment-2118835505 From jbhateja at openjdk.org Sat May 18 23:06:09 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 18 May 2024 23:06:09 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v23] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 17:52:31 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > added comment about UseAPX and UseAVX > 2 correspondence Marked as reviewed by jbhateja (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18476#pullrequestreview-2064958673 From amitkumar at openjdk.org Sun May 19 15:28:23 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sun, 19 May 2024 15:28:23 GMT Subject: RFR: 8332498: [aarch64, x86] improving OpToAssembly output for partialSubtypeCheckConstSuper Instruct Message-ID: format method generated in `ad_aarch64_format.cpp` previously: void partialSubtypeCheckConstSuperNode::format(PhaseRegAlloc *ra, outputStream *st) const { // Start at oper_input_base() and count operands unsigned idx0 = 1; unsigned idx1 = 1; // sub unsigned idx2 = idx1 + opnd_array(1)->num_edges(); // super_reg unsigned idx3 = idx2 + opnd_array(2)->num_edges(); // super_con unsigned idx4 = idx3 + opnd_array(3)->num_edges(); // vtemp unsigned idx5 = idx4 + opnd_array(4)->num_edges(); // tempR1 unsigned idx6 = idx5 + opnd_array(5)->num_edges(); // tempR2 unsigned idx7 = idx6 + opnd_array(6)->num_edges(); // tempR3 st->print_raw("partialSubtypeCheck "); opnd_array(0)->int_format(ra, this, st); // result st->print_raw(", "); opnd_array(1)->ext_format(ra, this,idx1, st); // sub st->print_raw(", super"); } format method generated in `ad_aarch64_format.cpp` with this change: void partialSubtypeCheckConstSuperNode::format(PhaseRegAlloc *ra, outputStream *st) const { // Start at oper_input_base() and count operands unsigned idx0 = 1; unsigned idx1 = 1; // sub unsigned idx2 = idx1 + opnd_array(1)->num_edges(); // super_reg unsigned idx3 = idx2 + opnd_array(2)->num_edges(); // super_con unsigned idx4 = idx3 + opnd_array(3)->num_edges(); // vtemp unsigned idx5 = idx4 + opnd_array(4)->num_edges(); // tempR1 unsigned idx6 = idx5 + opnd_array(5)->num_edges(); // tempR2 unsigned idx7 = idx6 + opnd_array(6)->num_edges(); // tempR3 st->print_raw("partialSubtypeCheck "); opnd_array(0)->int_format(ra, this, st); // result st->print_raw(", "); opnd_array(1)->ext_format(ra, this,idx1, st); // sub st->print_raw(", "); opnd_array(2)->ext_format(ra, this,idx2, st); // super_reg st->print_raw(", "); opnd_array(3)->ext_format(ra, this,idx3, st); // super_con } ------------- Commit messages: - updates aarch64.ad & x86_64.ad Changes: https://git.openjdk.org/jdk/pull/19294/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19294&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332498 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19294.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19294/head:pull/19294 PR: https://git.openjdk.org/jdk/pull/19294 From amitkumar at openjdk.org Sun May 19 15:34:24 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sun, 19 May 2024 15:34:24 GMT Subject: RFR: 8332498: [aarch64, x86] improving OpToAssembly output for partialSubtypeCheckConstSuper Instruct [v2] In-Reply-To: References: Message-ID: > format method generated in `ad_aarch64_format.cpp` previously: > > void partialSubtypeCheckConstSuperNode::format(PhaseRegAlloc *ra, outputStream *st) const { > // Start at oper_input_base() and count operands > unsigned idx0 = 1; > unsigned idx1 = 1; // sub > unsigned idx2 = idx1 + opnd_array(1)->num_edges(); // super_reg > unsigned idx3 = idx2 + opnd_array(2)->num_edges(); // super_con > unsigned idx4 = idx3 + opnd_array(3)->num_edges(); // vtemp > unsigned idx5 = idx4 + opnd_array(4)->num_edges(); // tempR1 > unsigned idx6 = idx5 + opnd_array(5)->num_edges(); // tempR2 > unsigned idx7 = idx6 + opnd_array(6)->num_edges(); // tempR3 > st->print_raw("partialSubtypeCheck "); > opnd_array(0)->int_format(ra, this, st); // result > st->print_raw(", "); > opnd_array(1)->ext_format(ra, this,idx1, st); // sub > st->print_raw(", super"); > } > > > format method generated in `ad_aarch64_format.cpp` with this change: > > void partialSubtypeCheckConstSuperNode::format(PhaseRegAlloc *ra, outputStream *st) const { > // Start at oper_input_base() and count operands > unsigned idx0 = 1; > unsigned idx1 = 1; // sub > unsigned idx2 = idx1 + opnd_array(1)->num_edges(); // super_reg > unsigned idx3 = idx2 + opnd_array(2)->num_edges(); // super_con > unsigned idx4 = idx3 + opnd_array(3)->num_edges(); // vtemp > unsigned idx5 = idx4 + opnd_array(4)->num_edges(); // tempR1 > unsigned idx6 = idx5 + opnd_array(5)->num_edges(); // tempR2 > unsigned idx7 = idx6 + opnd_array(6)->num_edges(); // tempR3 > st->print_raw("partialSubtypeCheck "); > opnd_array(0)->int_format(ra, this, st); // result > st->print_raw(", "); > opnd_array(1)->ext_format(ra, this,idx1, st); // sub > st->print_raw(", "); > opnd_array(2)->ext_format(ra, this,idx2, st); // super_reg > st->print_raw(", "); > opnd_array(3)->ext_format(ra, this,idx3, st); // super_con > } Amit Kumar has refreshed the contents of this pull request, and previous commits have been removed. Incremental views are not available. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19294/files - new: https://git.openjdk.org/jdk/pull/19294/files/54ec227e..f1ce9b0e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19294&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19294&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19294.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19294/head:pull/19294 PR: https://git.openjdk.org/jdk/pull/19294 From amitkumar at openjdk.org Sun May 19 15:34:24 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sun, 19 May 2024 15:34:24 GMT Subject: Withdrawn: 8332498: [aarch64, x86] improving OpToAssembly output for partialSubtypeCheckConstSuper Instruct In-Reply-To: References: Message-ID: On Sun, 19 May 2024 15:23:56 GMT, Amit Kumar wrote: > format method generated in `ad_aarch64_format.cpp` previously: > > void partialSubtypeCheckConstSuperNode::format(PhaseRegAlloc *ra, outputStream *st) const { > // Start at oper_input_base() and count operands > unsigned idx0 = 1; > unsigned idx1 = 1; // sub > unsigned idx2 = idx1 + opnd_array(1)->num_edges(); // super_reg > unsigned idx3 = idx2 + opnd_array(2)->num_edges(); // super_con > unsigned idx4 = idx3 + opnd_array(3)->num_edges(); // vtemp > unsigned idx5 = idx4 + opnd_array(4)->num_edges(); // tempR1 > unsigned idx6 = idx5 + opnd_array(5)->num_edges(); // tempR2 > unsigned idx7 = idx6 + opnd_array(6)->num_edges(); // tempR3 > st->print_raw("partialSubtypeCheck "); > opnd_array(0)->int_format(ra, this, st); // result > st->print_raw(", "); > opnd_array(1)->ext_format(ra, this,idx1, st); // sub > st->print_raw(", super"); > } > > > format method generated in `ad_aarch64_format.cpp` with this change: > > void partialSubtypeCheckConstSuperNode::format(PhaseRegAlloc *ra, outputStream *st) const { > // Start at oper_input_base() and count operands > unsigned idx0 = 1; > unsigned idx1 = 1; // sub > unsigned idx2 = idx1 + opnd_array(1)->num_edges(); // super_reg > unsigned idx3 = idx2 + opnd_array(2)->num_edges(); // super_con > unsigned idx4 = idx3 + opnd_array(3)->num_edges(); // vtemp > unsigned idx5 = idx4 + opnd_array(4)->num_edges(); // tempR1 > unsigned idx6 = idx5 + opnd_array(5)->num_edges(); // tempR2 > unsigned idx7 = idx6 + opnd_array(6)->num_edges(); // tempR3 > st->print_raw("partialSubtypeCheck "); > opnd_array(0)->int_format(ra, this, st); // result > st->print_raw(", "); > opnd_array(1)->ext_format(ra, this,idx1, st); // sub > st->print_raw(", "); > opnd_array(2)->ext_format(ra, this,idx2, st); // super_reg > st->print_raw(", "); > opnd_array(3)->ext_format(ra, this,idx3, st); // super_con > } This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/19294 From amitkumar at openjdk.org Sun May 19 15:41:19 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sun, 19 May 2024 15:41:19 GMT Subject: RFR: 8332498: [aarch64, x86] improving OpToAssembly output for partialSubtypeCheckConstSuper Instruct Message-ID: format method generated previously: void partialSubtypeCheckConstSuperNode::format(PhaseRegAlloc *ra, outputStream *st) const { // Start at oper_input_base() and count operands unsigned idx0 = 1; unsigned idx1 = 1; // sub unsigned idx2 = idx1 + opnd_array(1)->num_edges(); // super_reg unsigned idx3 = idx2 + opnd_array(2)->num_edges(); // super_con unsigned idx4 = idx3 + opnd_array(3)->num_edges(); // vtemp unsigned idx5 = idx4 + opnd_array(4)->num_edges(); // tempR1 unsigned idx6 = idx5 + opnd_array(5)->num_edges(); // tempR2 unsigned idx7 = idx6 + opnd_array(6)->num_edges(); // tempR3 st->print_raw("partialSubtypeCheck "); opnd_array(0)->int_format(ra, this, st); // result st->print_raw(", "); opnd_array(1)->ext_format(ra, this,idx1, st); // sub st->print_raw(", super"); } format method generated with this change: void partialSubtypeCheckConstSuperNode::format(PhaseRegAlloc *ra, outputStream *st) const { // Start at oper_input_base() and count operands unsigned idx0 = 1; unsigned idx1 = 1; // sub unsigned idx2 = idx1 + opnd_array(1)->num_edges(); // super_reg unsigned idx3 = idx2 + opnd_array(2)->num_edges(); // super_con unsigned idx4 = idx3 + opnd_array(3)->num_edges(); // vtemp unsigned idx5 = idx4 + opnd_array(4)->num_edges(); // tempR1 unsigned idx6 = idx5 + opnd_array(5)->num_edges(); // tempR2 unsigned idx7 = idx6 + opnd_array(6)->num_edges(); // tempR3 st->print_raw("partialSubtypeCheck "); opnd_array(0)->int_format(ra, this, st); // result st->print_raw(", "); opnd_array(1)->ext_format(ra, this,idx1, st); // sub st->print_raw(", "); opnd_array(2)->ext_format(ra, this,idx2, st); // super_reg st->print_raw(", "); opnd_array(3)->ext_format(ra, this,idx3, st); // super_con } ------------- Commit messages: - updates aarch64.ad & x86_64.ad Changes: https://git.openjdk.org/jdk/pull/19295/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19295&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332498 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19295.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19295/head:pull/19295 PR: https://git.openjdk.org/jdk/pull/19295 From varadam at openjdk.org Mon May 20 06:36:28 2024 From: varadam at openjdk.org (Varada M) Date: Mon, 20 May 2024 06:36:28 GMT Subject: RFR: 8331935: Add support for primitive array C1 clone intrinsic in PPC [v2] In-Reply-To: References: Message-ID: > https://bugs.openjdk.org/browse/JDK-8302850 port for PPC64 > > JMH Benchmark Results > > > Before : > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 114.107 ? 1.337 ns/op > ArrayClone.byteArraycopy 10 avgt 15 130.492 ? 0.991 ns/op > ArrayClone.byteArraycopy 100 avgt 15 139.103 ? 1.913 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 321.688 ? 6.033 ns/op > ArrayClone.byteClone 0 avgt 15 227.602 ? 3.393 ns/op > ArrayClone.byteClone 10 avgt 15 237.624 ? 2.996 ns/op > ArrayClone.byteClone 100 avgt 15 239.219 ? 2.835 ns/op > > ArrayClone.byteClone 1000 avgt 15 355.571 ? 2.946 ns/op > ArrayClone.intArraycopy 0 avgt 15 113.275 ? 1.099 ns/op > ArrayClone.intArraycopy 10 avgt 15 129.763 ? 1.458 ns/op > ArrayClone.intArraycopy 100 avgt 15 213.327 ? 2.524 ns/op > ArrayClone.intArraycopy 1000 avgt 15 449.650 ? 7.338 ns/op > ArrayClone.intClone 0 avgt 15 225.682 ? 3.048 ns/op > ArrayClone.intClone 10 avgt 15 234.532 ? 2.817 ns/op > ArrayClone.intClone 100 avgt 15 295.934 ? 4.925 ns/op > ArrayClone.intClone 1000 avgt 15 573.368 ? 5.739 ns/op > Finished running test 'micro:java.lang.ArrayClone' > Test report is stored in build/aix-ppc64-server-release/test-results/micro_java_lang_ArrayClone > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > micro:java.lang.ArrayClone 1 1 0 0 > ============================== > TEST SUCCESS > > Finished building target 'test' in configuration 'aix-ppc64-server-release' > > > > > After: > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 113.894 ? 0.993 ns/op > ArrayClone.byteArraycopy 10 avgt 15 131.455 ? 0.956 ns/op > ArrayClone.byteArraycopy 100 avgt 15 139.145 ? 3.002 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 315.957 ? 14.591 ns/op > ArrayClone.byteClone 0 avgt 15 43.753 ? 3.669 ns/op > ArrayClone.byteClone 10 avgt 15 52.329 ? 1.041 ns/op > ArrayClone.byteClone 100 avgt 15 127.711 ? 3.938 ns/op > > ArrayClone.byteClone 1000 avgt 15 225.937 ? 1.987 ns/op > ArrayClone.intArraycopy 0 avgt 15 113.788 ? 0.770 ns/op > ArrayClone.intArraycopy 10 avgt 1... Varada M has updated the pull request incrementally with one additional commit since the last revision: Add support for primitive array C1 clone intrinsic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19250/files - new: https://git.openjdk.org/jdk/pull/19250/files/f484bbee..0aa5b21c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19250&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19250&range=00-01 Stats: 19 lines in 2 files changed: 6 ins; 8 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/19250.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19250/head:pull/19250 PR: https://git.openjdk.org/jdk/pull/19250 From fyang at openjdk.org Mon May 20 06:43:06 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 20 May 2024 06:43:06 GMT Subject: RFR: 8332153: RISC-V: enable tests and add comment for vector shift instruct (shared by vectorization and Vector API) [v2] In-Reply-To: References: Message-ID: On Thu, 16 May 2024 14:45:16 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> For vector shift instruct, some corresponding tests are not enabled, this is to enable them. >> And the way how vector shift instruct works is not clear, especially both vectorization (SLP in jdk) and Vector API share the same instruct's in riscv_v.ad, so also added some comment to clarify it. >> >> Thanks > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > fix words src/hotspot/cpu/riscv/riscv_v.ad line 1796: > 1794: // check https://docs.oracle.com/javase/specs/jls/se21/html/jls-15.html#jls-15.19 for details. > 1795: // > 1796: // Shift behaviour in Vector APi is defined as: Nit: s/APi/API/ test/hotspot/jtreg/compiler/c2/aarch64/TestVectorShiftShorts.java line 31: > 29: * > 30: * @requires vm.compiler2.enabled > 31: * @requires os.arch == "aarch64" | os.arch == "riscv64" Seems a bit weird to enable tests under `test/hotspot/jtreg/compiler/c2/aarch64/` for other cpus? test/hotspot/jtreg/compiler/vectorization/runner/ArrayShiftOpTest.java line 103: > 101: counts = {IRNode.RSHIFT_VI, ">0"}) > 102: @IR(applyIfPlatform = {"riscv64", "true"}, > 103: applyIfCPUFeature = {" v ", "true"}, Is the spaces around `v` necessary? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19265#discussion_r1606309014 PR Review Comment: https://git.openjdk.org/jdk/pull/19265#discussion_r1606318534 PR Review Comment: https://git.openjdk.org/jdk/pull/19265#discussion_r1606318107 From amitkumar at openjdk.org Mon May 20 07:32:03 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 20 May 2024 07:32:03 GMT Subject: RFR: 8331935: Add support for primitive array C1 clone intrinsic in PPC [v2] In-Reply-To: References: Message-ID: On Mon, 20 May 2024 06:36:28 GMT, Varada M wrote: >> https://bugs.openjdk.org/browse/JDK-8302850 port for PPC64 >> >> JMH Benchmark Results >> >> >> Before : >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 114.107 ? 1.337 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 130.492 ? 0.991 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 139.103 ? 1.913 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 321.688 ? 6.033 ns/op >> ArrayClone.byteClone 0 avgt 15 227.602 ? 3.393 ns/op >> ArrayClone.byteClone 10 avgt 15 237.624 ? 2.996 ns/op >> ArrayClone.byteClone 100 avgt 15 239.219 ? 2.835 ns/op >> >> ArrayClone.byteClone 1000 avgt 15 355.571 ? 2.946 ns/op >> ArrayClone.intArraycopy 0 avgt 15 113.275 ? 1.099 ns/op >> ArrayClone.intArraycopy 10 avgt 15 129.763 ? 1.458 ns/op >> ArrayClone.intArraycopy 100 avgt 15 213.327 ? 2.524 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 449.650 ? 7.338 ns/op >> ArrayClone.intClone 0 avgt 15 225.682 ? 3.048 ns/op >> ArrayClone.intClone 10 avgt 15 234.532 ? 2.817 ns/op >> ArrayClone.intClone 100 avgt 15 295.934 ? 4.925 ns/op >> ArrayClone.intClone 1000 avgt 15 573.368 ? 5.739 ns/op >> Finished running test 'micro:java.lang.ArrayClone' >> Test report is stored in build/aix-ppc64-server-release/test-results/micro_java_lang_ArrayClone >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> micro:java.lang.ArrayClone 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> Finished building target 'test' in configuration 'aix-ppc64-server-release' >> >> >> >> >> After: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 113.894 ? 0.993 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 131.455 ? 0.956 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 139.145 ? 3.002 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 315.957 ? 14.591 ns/op >> ArrayClone.byteClone 0 avgt 15 43.753 ? 3.669 ns/op >> ArrayClone.byteClone 10 avgt 15 52.329 ? 1.041 ns/op >> ArrayClone.byteClone 100 avgt 15 127.711 ? 3.938 ns/op >> >> ArrayClone.byteClone 1000 avgt 15 225.937 ? 1.987 ns/op >> Arr... > > Varada M has updated the pull request incrementally with one additional commit since the last revision: > > Add support for primitive array C1 clone intrinsic I guess you can update this as well: diff --git a/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp b/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp index 2424d820177..0c1e23c6353 100644 --- a/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp +++ b/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp @@ -2107,7 +2107,7 @@ void LIR_Assembler::emit_arraycopy(LIR_OpArrayCopy* op) { // subtype which we can't check or src is the same array as dst // but not necessarily exactly of type default_type. Label known_ok, halt; - metadata2reg(op->expected_type()->constant_encoding(), tmp); + metadata2reg(default_type->constant_encoding(), tmp); if (UseCompressedClassPointers) { // Tmp holds the default type. It currently comes uncompressed after the // load of a constant, so encode it. ------------- Changes requested by amitkumar (Committer). PR Review: https://git.openjdk.org/jdk/pull/19250#pullrequestreview-2065550268 From mli at openjdk.org Mon May 20 08:31:13 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 20 May 2024 08:31:13 GMT Subject: RFR: 8332394: Add friendly output when @IR rule missing value [v2] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 07:33:43 GMT, Christian Hagedorn wrote: >> Thanks, it makes sense. >> >> I also created https://bugs.openjdk.org/browse/JDK-8332402 to track adding tests in `TestBadFormat` > > Thanks! New pr adding tests is here: https://github.com/openjdk/jdk/pull/19302 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19270#discussion_r1606436611 From fgao at openjdk.org Mon May 20 09:00:09 2024 From: fgao at openjdk.org (Fei Gao) Date: Mon, 20 May 2024 09:00:09 GMT Subject: RFR: 8320622: [TEST] Improve coverage of compiler/loopopts/superword/TestMulAddS2I.java on different platforms Message-ID: It would be worthwhile to improve the test coverage on all platforms by applying another common VM flag. ------------- Commit messages: - 8320622: [TEST] Improve coverage of compiler/loopopts/superword/TestMulAddS2I.java on different platforms Changes: https://git.openjdk.org/jdk/pull/19305/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19305&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320622 Stats: 7 lines in 1 file changed: 0 ins; 4 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19305.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19305/head:pull/19305 PR: https://git.openjdk.org/jdk/pull/19305 From varadam at openjdk.org Mon May 20 09:07:14 2024 From: varadam at openjdk.org (Varada M) Date: Mon, 20 May 2024 09:07:14 GMT Subject: RFR: 8331935: Add support for primitive array C1 clone intrinsic in PPC [v3] In-Reply-To: References: Message-ID: > https://bugs.openjdk.org/browse/JDK-8302850 port for PPC64 > > JMH Benchmark Results > > > Before : > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 114.107 ? 1.337 ns/op > ArrayClone.byteArraycopy 10 avgt 15 130.492 ? 0.991 ns/op > ArrayClone.byteArraycopy 100 avgt 15 139.103 ? 1.913 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 321.688 ? 6.033 ns/op > ArrayClone.byteClone 0 avgt 15 227.602 ? 3.393 ns/op > ArrayClone.byteClone 10 avgt 15 237.624 ? 2.996 ns/op > ArrayClone.byteClone 100 avgt 15 239.219 ? 2.835 ns/op > > ArrayClone.byteClone 1000 avgt 15 355.571 ? 2.946 ns/op > ArrayClone.intArraycopy 0 avgt 15 113.275 ? 1.099 ns/op > ArrayClone.intArraycopy 10 avgt 15 129.763 ? 1.458 ns/op > ArrayClone.intArraycopy 100 avgt 15 213.327 ? 2.524 ns/op > ArrayClone.intArraycopy 1000 avgt 15 449.650 ? 7.338 ns/op > ArrayClone.intClone 0 avgt 15 225.682 ? 3.048 ns/op > ArrayClone.intClone 10 avgt 15 234.532 ? 2.817 ns/op > ArrayClone.intClone 100 avgt 15 295.934 ? 4.925 ns/op > ArrayClone.intClone 1000 avgt 15 573.368 ? 5.739 ns/op > Finished running test 'micro:java.lang.ArrayClone' > Test report is stored in build/aix-ppc64-server-release/test-results/micro_java_lang_ArrayClone > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > micro:java.lang.ArrayClone 1 1 0 0 > ============================== > TEST SUCCESS > > Finished building target 'test' in configuration 'aix-ppc64-server-release' > > > > > After: > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 113.894 ? 0.993 ns/op > ArrayClone.byteArraycopy 10 avgt 15 131.455 ? 0.956 ns/op > ArrayClone.byteArraycopy 100 avgt 15 139.145 ? 3.002 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 315.957 ? 14.591 ns/op > ArrayClone.byteClone 0 avgt 15 43.753 ? 3.669 ns/op > ArrayClone.byteClone 10 avgt 15 52.329 ? 1.041 ns/op > ArrayClone.byteClone 100 avgt 15 127.711 ? 3.938 ns/op > > ArrayClone.byteClone 1000 avgt 15 225.937 ? 1.987 ns/op > ArrayClone.intArraycopy 0 avgt 15 113.788 ? 0.770 ns/op > ArrayClone.intArraycopy 10 avgt 1... Varada M has updated the pull request incrementally with one additional commit since the last revision: Add support for primitive array C1 clone intrinsic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19250/files - new: https://git.openjdk.org/jdk/pull/19250/files/0aa5b21c..28013450 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19250&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19250&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19250.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19250/head:pull/19250 PR: https://git.openjdk.org/jdk/pull/19250 From duke at openjdk.org Mon May 20 09:53:08 2024 From: duke at openjdk.org (Jens Lidestrom) Date: Mon, 20 May 2024 09:53:08 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v20] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Fri, 17 May 2024 09:31:33 GMT, Per Minborg wrote: >> # Stable Values & Collections (Internal) >> >> >> >> ## Summary >> This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. >> >> ## Goals >> * Provide an easy and intuitive API to describe value holders that can change at most once. >> * Decouple declaration from initialization without significant footprint or performance penalties. >> * Reduce the amount of static initializer and/or field initialization code. >> * Uphold integrity and consistency, even in a multi-threaded environment. >> >> For more details, see the draft JEP: https://openjdk.org/jeps/8312611 >> >> ## Performance >> Performance compared to instance variables using two `AtomicReference` and two protected by double-checked locking under concurrent access by all threads: >> >> >> Benchmark Mode Cnt Score Error Units >> StableBenchmark.atomic thrpt 10 259.478 ? 36.809 ops/us >> StableBenchmark.dcl thrpt 10 225.710 ? 26.638 ops/us >> StableBenchmark.stable thrpt 10 4382.478 ? 1151.472 ops/us <- StableValue significantly faster >> >> >> Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (all threads): >> >> >> Benchmark Mode Cnt Score Error Units >> StableStaticBenchmark.atomic thrpt 10 6487.835 ? 385.639 ops/us >> StableStaticBenchmark.dcl thrpt 10 6605.239 ? 210.610 ops/us >> StableStaticBenchmark.stable thrpt 10 14338.239 ? 1426.874 ops/us >> StableStaticBenchmark.staticCHI thrpt 10 13780.341 ? 1839.651 ops/us >> >> >> Performance for stable lists (thread safe) in both instance and static contexts whereby we access a single value compared to `ArrayList` instances (which are not thread-safe) (all threads): >> >> >> Benchmark Mode Cnt Score Error Units >> StableListElementBenchmark.instanceArrayList thrpt 10 5812.992 ? 1169.730 ops... > > Per Minborg has updated the pull request incrementally with two additional commits since the last revision: > > - Add benchmarks for memoized IntFunction and Function > - Add benchmark for memoized supplier src/java.base/share/classes/jdk/internal/lang/StableArray.java line 66: > 64: * @throws IllegalArgumentException if the provided {@code length} is {@code < 0} > 65: */ > 66: static StableArray of(int length) { I interpret the method name `of` as a method that creates an object that contains the argument as some kind of member, in the way that `List.of` and friends work. My intuitive interpretation of `StableArray.of(10)` is that it returns an array with the single element 10. I think a method like this should be named `empty`, or `emptyOfLength` or something like that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1606529054 From mli at openjdk.org Mon May 20 10:22:26 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 20 May 2024 10:22:26 GMT Subject: RFR: 8332153: RISC-V: enable tests and add comment for vector shift instruct (shared by vectorization and Vector API) [v3] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > For vector shift instruct, some corresponding tests are not enabled, this is to enable them. > And the way how vector shift instruct works is not clear, especially both vectorization (SLP in jdk) and Vector API share the same instruct's in riscv_v.ad, so also added some comment to clarify it. > > Thanks Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: fix misc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19265/files - new: https://git.openjdk.org/jdk/pull/19265/files/809f92e9..7216d886 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19265&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19265&range=01-02 Stats: 43 lines in 9 files changed: 0 ins; 0 del; 43 mod Patch: https://git.openjdk.org/jdk/pull/19265.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19265/head:pull/19265 PR: https://git.openjdk.org/jdk/pull/19265 From mli at openjdk.org Mon May 20 10:22:27 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 20 May 2024 10:22:27 GMT Subject: RFR: 8332153: RISC-V: enable tests and add comment for vector shift instruct (shared by vectorization and Vector API) [v2] In-Reply-To: References: Message-ID: On Mon, 20 May 2024 06:28:18 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> fix words > > src/hotspot/cpu/riscv/riscv_v.ad line 1796: > >> 1794: // check https://docs.oracle.com/javase/specs/jls/se21/html/jls-15.html#jls-15.19 for details. >> 1795: // >> 1796: // Shift behaviour in Vector APi is defined as: > > Nit: s/APi/API/ fixed. > test/hotspot/jtreg/compiler/c2/aarch64/TestVectorShiftShorts.java line 31: > >> 29: * >> 30: * @requires vm.compiler2.enabled >> 31: * @requires os.arch == "aarch64" | os.arch == "riscv64" > > Seems a bit weird to enable tests under `test/hotspot/jtreg/compiler/c2/aarch64/` for other cpus? Yeh, it's bit, removed ` | os.arch == "riscv64"`. I can also modify the summary and enable the test for riscv if needed. > test/hotspot/jtreg/compiler/vectorization/runner/ArrayShiftOpTest.java line 103: > >> 101: counts = {IRNode.RSHIFT_VI, ">0"}) >> 102: @IR(applyIfPlatform = {"riscv64", "true"}, >> 103: applyIfCPUFeature = {" v ", "true"}, > > Is the spaces around `v` necessary? Thanks for catching, fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19265#discussion_r1606558591 PR Review Comment: https://git.openjdk.org/jdk/pull/19265#discussion_r1606560390 PR Review Comment: https://git.openjdk.org/jdk/pull/19265#discussion_r1606558286 From epeter at openjdk.org Mon May 20 10:33:03 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 20 May 2024 10:33:03 GMT Subject: RFR: 8320622: [TEST] Improve coverage of compiler/loopopts/superword/TestMulAddS2I.java on different platforms In-Reply-To: References: Message-ID: On Mon, 20 May 2024 08:56:35 GMT, Fei Gao wrote: > It would be worthwhile to improve the test coverage on all platforms by applying another common VM flag. Welcome back, long time no see ? Looks good to me! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19305#pullrequestreview-2065898576 From bkilambi at openjdk.org Mon May 20 10:44:05 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 20 May 2024 10:44:05 GMT Subject: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v3] In-Reply-To: References: Message-ID: On Wed, 17 Apr 2024 12:37:01 GMT, Yudi Zheng wrote: >> `multiply_to_len` seems to be used by `generate_squareToLen` as well for aarch64 and riscv but `zlen` is still passed in a register. >> >> https://github.com/openjdk/jdk/blob/870a6127cf54264c691f7322d775b202705c3bfa/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4710 >> https://github.com/openjdk/jdk/blob/870a6127cf54264c691f7322d775b202705c3bfa/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L2881 >> >> I think it might work anyway but it might be better to adapt them if only for completeness. > > @dafedafe @dean-long please take a look and let me know if there are further issues, thanks! Hi @mur47x111, do you happen to have any performance results with this patch? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18226#issuecomment-2120179701 From fgao at openjdk.org Mon May 20 11:48:01 2024 From: fgao at openjdk.org (Fei Gao) Date: Mon, 20 May 2024 11:48:01 GMT Subject: RFR: 8320622: [TEST] Improve coverage of compiler/loopopts/superword/TestMulAddS2I.java on different platforms In-Reply-To: References: Message-ID: <8pbPEbiwKJRQyaOm8Bo7jFb78fcAIUQbalVgDDgnGhU=.1e8ddf4b-0939-46b0-8712-1cebb9f35841@github.com> On Mon, 20 May 2024 10:30:54 GMT, Emanuel Peter wrote: > Welcome back, long time no see ?? > > Looks good to me! Thanks @eme64 ! Happy to see you again ?? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19305#issuecomment-2120294855 From liach at openjdk.org Mon May 20 12:24:09 2024 From: liach at openjdk.org (Chen Liang) Date: Mon, 20 May 2024 12:24:09 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v20] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Mon, 20 May 2024 09:50:17 GMT, Jens Lidestrom wrote: >> Per Minborg has updated the pull request incrementally with two additional commits since the last revision: >> >> - Add benchmarks for memoized IntFunction and Function >> - Add benchmark for memoized supplier > > src/java.base/share/classes/jdk/internal/lang/StableArray.java line 66: > >> 64: * @throws IllegalArgumentException if the provided {@code length} is {@code < 0} >> 65: */ >> 66: static StableArray of(int length) { > > I interpret the method name `of` as a method that creates an object that contains the argument as some kind of member, in the way that `List.of` and friends work. > > My intuitive interpretation of `StableArray.of(10)` is that it returns an array with the single element 10. > > I think a method like this should be named `empty`, or `emptyOfLength` or something like that. Stable arrays aren't supposed to be initialized with values, so I think your point is moot. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18794#discussion_r1606708916 From syan at openjdk.org Mon May 20 12:30:08 2024 From: syan at openjdk.org (SendaoYan) Date: Mon, 20 May 2024 12:30:08 GMT Subject: RFR: 8332499: Gtest codestrings.validate_vm fail on linux x64 Message-ID: Hi all, ------------- Commit messages: - 8332499: Gtest codestrings.validate_vm fail on linux x64 Changes: https://git.openjdk.org/jdk/pull/19309/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19309&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332499 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19309.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19309/head:pull/19309 PR: https://git.openjdk.org/jdk/pull/19309 From syan at openjdk.org Mon May 20 12:40:14 2024 From: syan at openjdk.org (SendaoYan) Date: Mon, 20 May 2024 12:40:14 GMT Subject: RFR: 8332499: Gtest codestrings.validate_vm fail on linux x64 [v2] In-Reply-To: References: Message-ID: > Hi all, SendaoYan has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8332499: Gtest codestrings.validate_vm fail on linux x64 Signed-off-by: sendaoYan ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19309/files - new: https://git.openjdk.org/jdk/pull/19309/files/bb26a776..bd894e12 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19309&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19309&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19309.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19309/head:pull/19309 PR: https://git.openjdk.org/jdk/pull/19309 From syan at openjdk.org Mon May 20 12:51:26 2024 From: syan at openjdk.org (SendaoYan) Date: Mon, 20 May 2024 12:51:26 GMT Subject: RFR: 8332499: Gtest codestrings.validate_vm fail on linux x64 [v3] In-Reply-To: References: Message-ID: > Hi all, > There's some arch-specific code to trim trailing entries as descripted in [JDK-8332499](https://bugs.openjdk.org/browse/JDK-8332499). Only change the gtest testcase, the risk is low. SendaoYan has updated the pull request incrementally with one additional commit since the last revision: 8332499: Gtest codestrings.validate_vm fail on linux x64 Signed-off-by: sendaoYan ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19309/files - new: https://git.openjdk.org/jdk/pull/19309/files/bd894e12..01b9e688 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19309&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19309&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19309.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19309/head:pull/19309 PR: https://git.openjdk.org/jdk/pull/19309 From amitkumar at openjdk.org Mon May 20 14:14:00 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 20 May 2024 14:14:00 GMT Subject: RFR: 8332498: [aarch64, x86] improving OpToAssembly output for partialSubtypeCheckConstSuper Instruct In-Reply-To: References: Message-ID: On Sun, 19 May 2024 15:36:15 GMT, Amit Kumar wrote: > format method generated previously: > > void partialSubtypeCheckConstSuperNode::format(PhaseRegAlloc *ra, outputStream *st) const { > // Start at oper_input_base() and count operands > unsigned idx0 = 1; > unsigned idx1 = 1; // sub > unsigned idx2 = idx1 + opnd_array(1)->num_edges(); // super_reg > unsigned idx3 = idx2 + opnd_array(2)->num_edges(); // super_con > unsigned idx4 = idx3 + opnd_array(3)->num_edges(); // vtemp > unsigned idx5 = idx4 + opnd_array(4)->num_edges(); // tempR1 > unsigned idx6 = idx5 + opnd_array(5)->num_edges(); // tempR2 > unsigned idx7 = idx6 + opnd_array(6)->num_edges(); // tempR3 > st->print_raw("partialSubtypeCheck "); > opnd_array(0)->int_format(ra, this, st); // result > st->print_raw(", "); > opnd_array(1)->ext_format(ra, this,idx1, st); // sub > st->print_raw(", super"); > } > > > format method generated with this change: > > void partialSubtypeCheckConstSuperNode::format(PhaseRegAlloc *ra, outputStream *st) const { > // Start at oper_input_base() and count operands > unsigned idx0 = 1; > unsigned idx1 = 1; // sub > unsigned idx2 = idx1 + opnd_array(1)->num_edges(); // super_reg > unsigned idx3 = idx2 + opnd_array(2)->num_edges(); // super_con > unsigned idx4 = idx3 + opnd_array(3)->num_edges(); // vtemp > unsigned idx5 = idx4 + opnd_array(4)->num_edges(); // tempR1 > unsigned idx6 = idx5 + opnd_array(5)->num_edges(); // tempR2 > unsigned idx7 = idx6 + opnd_array(6)->num_edges(); // tempR3 > st->print_raw("partialSubtypeCheck "); > opnd_array(0)->int_format(ra, this, st); // result > st->print_raw(", "); > opnd_array(1)->ext_format(ra, this,idx1, st); // sub > st->print_raw(", "); > opnd_array(2)->ext_format(ra, this,idx2, st); // super_reg > st->print_raw(", "); > opnd_array(3)->ext_format(ra, this,idx3, st); // super_con > } @theRealAph would you review this trivial fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19295#issuecomment-2120545914 From duke at openjdk.org Mon May 20 15:00:09 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Mon, 20 May 2024 15:00:09 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v23] In-Reply-To: References: Message-ID: On Mon, 15 Apr 2024 17:43:45 GMT, Vladimir Kozlov wrote: >> Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: >> >> added comment about UseAPX and UseAVX > 2 correspondence > > I have few comments. Hi @vnkozlov, Is there anything else you would like to see for this PR. If not, I would like to check it in this week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2120633389 From galder at openjdk.org Mon May 20 16:09:07 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 20 May 2024 16:09:07 GMT Subject: RFR: 8331934: [s390x] Add support for primitive array C1 clone intrinsic [v3] In-Reply-To: <2qTcJfbhEmK8YhwBTdpeIB9C1g6RPDhUDntb5S9Cp0E=.cbc73833-b93f-4ab7-8e67-630852cc73cc@github.com> References: <2qTcJfbhEmK8YhwBTdpeIB9C1g6RPDhUDntb5S9Cp0E=.cbc73833-b93f-4ab7-8e67-630852cc73cc@github.com> Message-ID: On Sat, 18 May 2024 14:05:52 GMT, Amit Kumar wrote: >> Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: >> >> - Merge master >> - s390x Port >> - Update src/hotspot/share/c1/c1_GraphBuilder.cpp >> >> Co-authored-by: Dean Long <17332032+dean-long at users.noreply.github.com> >> - Fix assert to only have a single ! >> - Assert type is not interface >> - Remove whitespace >> - Expanded testing in TestNullArrayClone >> >> * Added byte[] and long[] tests. >> * Verified that the cloned array has the same contents. >> * Increase number of iterations reach tier 3 threshold. >> - Update src/hotspot/share/c1/c1_GraphBuilder.cpp >> >> Co-authored-by: Boris <42576543+bulasevich at users.noreply.github.com> >> - Added test summary >> - Use vmIntrinsics instead of vmIntrinsicID >> - ... and 16 more: https://git.openjdk.org/jdk/compare/2f10a316...865de5ba > > @galderz if possible can you review this ? > > Maybe this could ease you a bit while review: Testing for `{tier1} X {fastdebug, slowdebug, release}` and `{tier1 -XX:TieredStopAtLevel=1} X {fastdebug, slowdebug, release}` was clean. Benchmarking also shows that we are good on this. You can check the result [here](https://github.com/openjdk/jdk/pull/19220#issuecomment-2115237059) :-) @offamitkumar I'm no s390 expert, so I can only do a light review on the code. The changes look good to me and the benchmark results show improvements. One thing I would suggest is maybe expanding the testing a bit, e.g. hotspot_compiler, hotspot_gc, hotspot_serviceability, hotspot_runtime, and tier1-3 see https://github.com/openjdk/jdk/pull/17667#issuecomment-2071741244 for further details. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19220#issuecomment-2120755016 From dcubed at openjdk.org Mon May 20 16:17:03 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 20 May 2024 16:17:03 GMT Subject: RFR: 8331885: C2: meet between unloaded and speculative types is not symmetric [v2] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 12:30:27 GMT, Vladimir Ivanov wrote: >> `TypeInstPtr::xmeet_unloaded` computes the MEET of two InstPtrs when at least one is unloaded, but doesn't preserve speculative part if one is present. It causes the corresponding assert to fail. >> >> Proposed fix unconditionally keeps speculative part. >> >> Testing: hs-tier1 - hs-tier4 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > AlwaysIncrementalInline is a debugflag @iwanowww - Will this fix be integrated soon? There are two solid failures in every Tier5 job set and have been for quite a while. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19249#issuecomment-2120769439 From vlivanov at openjdk.org Mon May 20 17:59:06 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 20 May 2024 17:59:06 GMT Subject: RFR: 8331885: C2: meet between unloaded and speculative types is not symmetric [v2] In-Reply-To: References: Message-ID: <2nH6oVcCtoaM6QUWJ-p_kma4k5lWH1pLlfMs3SVfmoE=.45704a0f-82f0-4b7c-87c6-6bbec1426dfe@github.com> On Fri, 17 May 2024 12:30:27 GMT, Vladimir Ivanov wrote: >> `TypeInstPtr::xmeet_unloaded` computes the MEET of two InstPtrs when at least one is unloaded, but doesn't preserve speculative part if one is present. It causes the corresponding assert to fail. >> >> Proposed fix unconditionally keeps speculative part. >> >> Testing: hs-tier1 - hs-tier4 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > AlwaysIncrementalInline is a debugflag Thanks for the reviews, Roland and Tobias. hs-comp-stress and hs-precheckin-comp testing passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19249#issuecomment-2120930384 From vlivanov at openjdk.org Mon May 20 17:59:06 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 20 May 2024 17:59:06 GMT Subject: Integrated: 8331885: C2: meet between unloaded and speculative types is not symmetric In-Reply-To: References: Message-ID: <6FzwjmeFY-ykTUrx2p1ydcgp1jpiwmCyXN6kCT3qlwM=.3a36d3ee-7865-40f1-a3e6-06ba1c4f92af@github.com> On Wed, 15 May 2024 13:30:46 GMT, Vladimir Ivanov wrote: > `TypeInstPtr::xmeet_unloaded` computes the MEET of two InstPtrs when at least one is unloaded, but doesn't preserve speculative part if one is present. It causes the corresponding assert to fail. > > Proposed fix unconditionally keeps speculative part. > > Testing: hs-tier1 - hs-tier4 This pull request has now been integrated. Changeset: 7652f981 Author: Vladimir Ivanov URL: https://git.openjdk.org/jdk/commit/7652f9811bfddf08650b0c3277012074873deade Stats: 20 lines in 3 files changed: 12 ins; 0 del; 8 mod 8331885: C2: meet between unloaded and speculative types is not symmetric Reviewed-by: roland, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/19249 From kvn at openjdk.org Mon May 20 18:07:00 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 20 May 2024 18:07:00 GMT Subject: RFR: 8332498: [aarch64, x86] improving OpToAssembly output for partialSubtypeCheckConstSuper Instruct In-Reply-To: References: Message-ID: On Sun, 19 May 2024 15:36:15 GMT, Amit Kumar wrote: > format method generated previously: > > void partialSubtypeCheckConstSuperNode::format(PhaseRegAlloc *ra, outputStream *st) const { > // Start at oper_input_base() and count operands > unsigned idx0 = 1; > unsigned idx1 = 1; // sub > unsigned idx2 = idx1 + opnd_array(1)->num_edges(); // super_reg > unsigned idx3 = idx2 + opnd_array(2)->num_edges(); // super_con > unsigned idx4 = idx3 + opnd_array(3)->num_edges(); // vtemp > unsigned idx5 = idx4 + opnd_array(4)->num_edges(); // tempR1 > unsigned idx6 = idx5 + opnd_array(5)->num_edges(); // tempR2 > unsigned idx7 = idx6 + opnd_array(6)->num_edges(); // tempR3 > st->print_raw("partialSubtypeCheck "); > opnd_array(0)->int_format(ra, this, st); // result > st->print_raw(", "); > opnd_array(1)->ext_format(ra, this,idx1, st); // sub > st->print_raw(", super"); > } > > > format method generated with this change: > > void partialSubtypeCheckConstSuperNode::format(PhaseRegAlloc *ra, outputStream *st) const { > // Start at oper_input_base() and count operands > unsigned idx0 = 1; > unsigned idx1 = 1; // sub > unsigned idx2 = idx1 + opnd_array(1)->num_edges(); // super_reg > unsigned idx3 = idx2 + opnd_array(2)->num_edges(); // super_con > unsigned idx4 = idx3 + opnd_array(3)->num_edges(); // vtemp > unsigned idx5 = idx4 + opnd_array(4)->num_edges(); // tempR1 > unsigned idx6 = idx5 + opnd_array(5)->num_edges(); // tempR2 > unsigned idx7 = idx6 + opnd_array(6)->num_edges(); // tempR3 > st->print_raw("partialSubtypeCheck "); > opnd_array(0)->int_format(ra, this, st); // result > st->print_raw(", "); > opnd_array(1)->ext_format(ra, this,idx1, st); // sub > st->print_raw(", "); > opnd_array(2)->ext_format(ra, this,idx2, st); // super_reg > st->print_raw(", "); > opnd_array(3)->ext_format(ra, this,idx3, st); // super_con > } Good for x64 and aarch64. And you need Andrew's approval too. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19295#pullrequestreview-2066738041 From kvn at openjdk.org Mon May 20 18:40:08 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 20 May 2024 18:40:08 GMT Subject: RFR: 8332538: Switch off JIT memory limit check for TestAlignVectorFuzzer.java Message-ID: Add flag `-XX:CompileCommand=MemLimit,*.*,0` to TestAlignVectorFuzzer.java test until [JDK-8332537](https://bugs.openjdk.org/browse/JDK-8332537) is fixed. Tested: tier1 ------------- Commit messages: - 8332538: Switch off JIT memory limit check for TestAlignVectorFuzzer.java Changes: https://git.openjdk.org/jdk/pull/19316/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19316&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332538 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19316.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19316/head:pull/19316 PR: https://git.openjdk.org/jdk/pull/19316 From kvn at openjdk.org Mon May 20 18:52:02 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 20 May 2024 18:52:02 GMT Subject: RFR: 8320622: [TEST] Improve coverage of compiler/loopopts/superword/TestMulAddS2I.java on different platforms In-Reply-To: References: Message-ID: <9QQIEkbwsbP5SUsMPjW4-YVkqWApkqPTNulw9gdNHMk=.2d68912c-b266-4459-b15b-953cb299b0db@github.com> On Mon, 20 May 2024 08:56:35 GMT, Fei Gao wrote: > It would be worthwhile to improve the test coverage on all platforms by applying another common VM flag. Are all platforms support both (`true` and `false`) values of `AlignVector`? I see it is only adjusted for x86 and aarch64 in `vm_version_.cpp` files. ------------- PR Review: https://git.openjdk.org/jdk/pull/19305#pullrequestreview-2066812906 From vlivanov at openjdk.org Mon May 20 22:00:26 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 20 May 2024 22:00:26 GMT Subject: RFR: 8332547: Unloaded signature classes in DirectMethodHandles Message-ID: JVM routinely installs loader constraints for unloaded signature classes when method resolution takes place. MethodHandle resolution took a different route and eagerly resolves signature classes instead (see `java.lang.invoke.MemberName$Factory::resolve` and `sun.invoke.util.VerifyAccess::isTypeVisible` for details). There's a micro-optimization which bypasses eager resolution for `java.*` classes. The downside is that `java.*` signature classes can show up as unloaded. It manifests as inlining failures during JIT-compilation and may cause severe performance issues. Proposed fix removes the aforementioned special case logic during `MethodHandle` resolution. In some cases it may slow down `MethodHandle` construction a bit (e.g., when repeatedly constructing `DirectMethodHandle`s with lots of arguments), but `MethodHandle` construction step is not performance critical. Testing: hs-tier1 - hs-tier4 ------------- Commit messages: - Fix - Test Changes: https://git.openjdk.org/jdk/pull/19319/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19319&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332547 Stats: 74 lines in 2 files changed: 68 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19319.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19319/head:pull/19319 PR: https://git.openjdk.org/jdk/pull/19319 From duke at openjdk.org Mon May 20 22:36:30 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Mon, 20 May 2024 22:36:30 GMT Subject: RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v8] In-Reply-To: References: Message-ID: > Hello everyone! Please review this ~non-vectorized~ implementation of `_updateBytesAdler32` intrinsic. Reference implementation for AArch64 can be found [here](https://github.com/openjdk/jdk9/blob/master/hotspot/src/cpu/aarch64/vm/stubGenerator_aarch64.cpp#L3281). > > ### Correctness checks > > Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok. All tier1 also passed. > > ### Performance results on T-Head board > > Enabled intrinsic: > > | Benchmark | (count) | Mode | Cnt | Score | Error | Units | > | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- | > | Adler32.TestAdler32.testAdler32Update | 64 | thrpt | 25 | 5522.693 | 23.387 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 128 | thrpt | 25 | 3430.761 | 9.210 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 256 | thrpt | 25 | 1962.888 | 5.323 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 512 | thrpt | 25 | 1050.938 | 0.144 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 1024 | thrpt | 25 | 549.227 | 0.375 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 2048 | thrpt | 25 | 280.829 | 0.170 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 5012 | thrpt | 25 | 116.333 | 0.057 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 8192 | thrpt | 25 | 71.392 | 0.060 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 16384 | thrpt | 25 | 35.784 | 0.019 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 32768 | thrpt | 25 | 17.924 | 0.010 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 65536 | thrpt | 25 | 8.940 | 0.003 | ops/ms | > > Disabled intrinsic: > > | Benchmark | (count) | Mode | Cnt | Score | Error | Units | > | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- | > |Adler32.TestAdler32.testAdler32Update|64|thrpt|25|655.633|5.845|ops/ms| > |Adler32.TestAdler32.testAdler32Update|128|thrpt|25|587.418|10.062|ops/ms| > |Adler32.TestAdler32.testAdler32Update|256|thrpt|25|546.675|11.598|ops/ms| > |Adler32.TestAdler32.testAdler32Update|512|thrpt|25|432.328|11.517|ops/ms| > |Adler32.TestAdler32.testAdler32Update|1024|thrpt|25|311.771|4.238|ops/ms| > |Adler32.TestAdler32.testAdler32Update|2048|thrpt|25|202.648|2.486|ops/ms| > |Adler32.TestAdler32.testAdler32Update|5012|thrpt|25|100.246|1.119|ops/ms| > |Adler32.TestAdler32.testAdler32Update|8192|t... ArsenyBochkarev has updated the pull request incrementally with three additional commits since the last revision: - Partially unroll L_by16_loop - Fix by64 function for vlen > 128 - Fix by16 function for vlen > 128 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18382/files - new: https://git.openjdk.org/jdk/pull/18382/files/be7d2551..453c169b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18382&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18382&range=06-07 Stats: 77 lines in 1 file changed: 49 ins; 11 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/18382.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18382/head:pull/18382 PR: https://git.openjdk.org/jdk/pull/18382 From duke at openjdk.org Mon May 20 22:41:12 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Mon, 20 May 2024 22:41:12 GMT Subject: RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v7] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 13:44:18 GMT, Gui Cao wrote: >> ArsenyBochkarev has updated the pull request incrementally with eight additional commits since the last revision: >> >> - Prettify L_nmax loop >> - Add comments in functions >> - Add explanation comment for L_nmax_loop >> - Fix L_nmax_loop for big lengths >> - Fix L_by16 loop step >> - Prettify intrinsic >> - Use LMUL=4 for most of the calculations >> - Use LMUL to load multiple data in one step > > I also ran the correctness test on the Banana Pi BPI-F3 board (has RVV1.0): > > Before this patch and disable UseRVV: > Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok > Before this patch and enable UseRVV: > Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok > > Apply this patch and disable UseRVV: > Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok > Apply this patch and enable UseRVV: > Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is Failed > > The TestAdler32.jtr on Failed is as follows: > [TestAdler32.jtr.log](https://github.com/openjdk/jdk/files/15350178/TestAdler32.jtr.log) Hello @zifeihan! Thanks for your efforts on improving this PR. I don't have access (yet) to Banana Pi board, so I can't debug precisely the case you pointed out. However, I know that vlen for Banana Pi is 256 bit, so I fixed probles for this case and checked functional correctness on QEMU for both 128 and 256 bit, which is OK now. Could you please do a re-run of `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` test? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18382#issuecomment-2121332468 From liach at openjdk.org Tue May 21 00:22:01 2024 From: liach at openjdk.org (Chen Liang) Date: Tue, 21 May 2024 00:22:01 GMT Subject: RFR: 8332547: Unloaded signature classes in DirectMethodHandles In-Reply-To: References: Message-ID: On Mon, 20 May 2024 21:29:20 GMT, Vladimir Ivanov wrote: > JVM routinely installs loader constraints for unloaded signature classes when method resolution takes place. MethodHandle resolution took a different route and eagerly resolves signature classes instead (see `java.lang.invoke.MemberName$Factory::resolve` and `sun.invoke.util.VerifyAccess::isTypeVisible` for details). > > There's a micro-optimization which bypasses eager resolution for `java.*` classes. The downside is that `java.*` signature classes can show up as unloaded. It manifests as inlining failures during JIT-compilation and may cause severe performance issues. > > Proposed fix removes the aforementioned special case logic during `MethodHandle` resolution. > > In some cases it may slow down `MethodHandle` construction a bit (e.g., when repeatedly constructing `DirectMethodHandle`s with lots of arguments), but `MethodHandle` construction step is not performance critical. > > Testing: hs-tier1 - hs-tier4 src/java.base/share/classes/sun/invoke/util/VerifyAccess.java line 291: > 289: // guarantees that classes with names beginning "java." cannot be aliased, > 290: // because class loaders cannot load them directly. However, it is beneficial > 291: // for JIT-compilers to ensure all signature classes are loaded. Since we anticipate this method to perform side effects, can we rename all of these non-pure `isTypeVisible` to `seeType`/`accessType` to indicate this desired side effect? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19319#discussion_r1607426203 From gcao at openjdk.org Tue May 21 02:05:23 2024 From: gcao at openjdk.org (Gui Cao) Date: Tue, 21 May 2024 02:05:23 GMT Subject: RFR: 8332533: RISC-V: Enable vector variable shift instructions for machines with RVV Message-ID: Hi, I noticed the following warning in the Opto JIT Code for the Vector API in the `test/jdk/jdk/incubator/vector/Byte256VectorTests.java: ASHRByte256VectorTests` test: -------------------------------------------------------------------------------- ** Rejected vector op (RShiftVB,byte,32) because architecture does not support variable vector shifts ** not supported: arity=2 opc=405 vlen=32 etype=byte ismask=0 is_masked_op=0 ``` the reason is because Matcher::supports_vector_ variable_shifts returns false. the port of RISC-V Vector API now supports the vector shifts, so this should return with UseRVV. By the Way, the Matcher::supports_vector_variable_shifts function was introduced by Vector API, and I think forgot to modify the Matcher::supports_vector_variable_shifts function when implementing vector shift. After the fix, the test passes normally and generates the Opto JIT Code such as: 1c2 loadV V1, [R7] # vector (rvv) 1ca lwu R28, [R28, #12] # loadN, compressed ptr, #@loadN ! Field: jdk/internal/vm/vector/VectorSupport$VectorPayload.payload (constant) 1ce decode_heap_oop R7, R28 #@decodeHeapOop 1d2 addi R7, R7, #16 # ptr, #@addP_reg_imm 1d4 loadV V2, [R7] # vector (rvv) 1dc vand_immI V1, V1, #7 1e4 spill [sp, #48] -> R7 # spill size = 32 1e6 # castII of R7, #@castII 1e6 vasrB V3, V2, V1 1fa spill [sp, #96] -> R29 # spill size = 32 1fc bgeu R7, R29, B101 #@cmpU_branch P=0.000001 C=-1.000000 ### Testing: qemu 8.1.50 with UseRVV: - [ ] Run tier1-3 tests (release) - [x] Run test/jdk/jdk/incubator/vector (fastdebug) ------------- Commit messages: - 8332533: RISC-V: Enable Matcher::supports_vector_variable_shifts with UseRVV Changes: https://git.openjdk.org/jdk/pull/19313/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19313&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332533 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19313.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19313/head:pull/19313 PR: https://git.openjdk.org/jdk/pull/19313 From fyang at openjdk.org Tue May 21 02:34:01 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 21 May 2024 02:34:01 GMT Subject: RFR: 8332153: RISC-V: enable tests and add comment for vector shift instruct (shared by vectorization and Vector API) [v3] In-Reply-To: References: Message-ID: On Mon, 20 May 2024 10:22:26 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> For vector shift instruct, some corresponding tests are not enabled, this is to enable them. >> And the way how vector shift instruct works is not clear, especially both vectorization (SLP in jdk) and Vector API share the same instruct's in riscv_v.ad, so also added some comment to clarify it. >> >> Thanks > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > fix misc src/hotspot/cpu/riscv/riscv_v.ad line 1786: > 1784: // vector shift > 1785: // > 1786: // Following shift instruct's are shared by vectorization (in SLP, superword.cpp) and vector API. s/vector API/Vector API/ test/hotspot/jtreg/compiler/c2/aarch64/TestVectorShiftShorts.java line 31: > 29: * > 30: * @requires vm.compiler2.enabled > 31: * @requires os.arch == "aarch64" Is this change still needed then? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19265#discussion_r1607490169 PR Review Comment: https://git.openjdk.org/jdk/pull/19265#discussion_r1607487356 From fyang at openjdk.org Tue May 21 02:37:00 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 21 May 2024 02:37:00 GMT Subject: RFR: 8332533: RISC-V: Enable vector variable shift instructions for machines with RVV In-Reply-To: References: Message-ID: On Mon, 20 May 2024 15:23:26 GMT, Gui Cao wrote: > Hi, I noticed the following warning in the Opto JIT Code for the Vector API in the `test/jdk/jdk/incubator/vector/Byte256VectorTests.java: ASHRByte256VectorTests` test: > > -------------------------------------------------------------------------------- > ** Rejected vector op (RShiftVB,byte,32) because architecture does not support variable vector shifts > ** not supported: arity=2 opc=405 vlen=32 etype=byte ismask=0 is_masked_op=0 > ``` > the reason is because Matcher::supports_vector_ variable_shifts returns false. the port of RISC-V Vector API now supports the vector shifts, so this should return with UseRVV. By the Way, the Matcher::supports_vector_variable_shifts function was introduced by Vector API, and I think forgot to modify the Matcher::supports_vector_variable_shifts function when implementing vector shift. > After the fix, the test passes normally and generates the Opto JIT Code such as: > > 1c2 loadV V1, [R7] # vector (rvv) > 1ca lwu R28, [R28, #12] # loadN, compressed ptr, #@loadN ! Field: jdk/internal/vm/vector/VectorSupport$VectorPayload.payload (constant) > 1ce decode_heap_oop R7, R28 #@decodeHeapOop > 1d2 addi R7, R7, #16 # ptr, #@addP_reg_imm > 1d4 loadV V2, [R7] # vector (rvv) > 1dc vand_immI V1, V1, #7 > 1e4 spill [sp, #48] -> R7 # spill size = 32 > 1e6 # castII of R7, #@castII > 1e6 vasrB V3, V2, V1 > 1fa spill [sp, #96] -> R29 # spill size = 32 > 1fc bgeu R7, R29, B101 #@cmpU_branch P=0.000001 C=-1.000000 > > > ### Testing: > qemu 8.1.50 with UseRVV: > - [ ] Run tier1-3 tests (release) > - [x] Run test/jdk/jdk/incubator/vector (fastdebug) Looks good. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19313#pullrequestreview-2067367667 From rcastanedalo at openjdk.org Tue May 21 04:23:13 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 21 May 2024 04:23:13 GMT Subject: RFR: 8332527: ZGC: generalize object cloning logic Message-ID: This changeset generalize the logic to produce a runtime call to clone a class instance so that it can be shared by other collectors adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). The changeset moves the logic from `ZBarrierSetC2` to the GC-shared `BarrierSetC2` class and adds support for 32-bits platforms. #### Testing - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). - tier4-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only). - `compiler/arraycopy` tests (linux-x86-debug) with [an additional patch](https://github.com/openjdk/jdk/commit/ddcf777894e740b8e6ddbbf8821e82a173c23ef4) that implements cloning of large class instances with a runtime clone call rather than arraycopy when using G1 (to exercise the generalized logic on a 32-bits platform). ------------- Commit messages: - Generalize logic to produce a runtime call that clones a class instance Changes: https://git.openjdk.org/jdk/pull/19311/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19311&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332527 Stats: 97 lines in 3 files changed: 55 ins; 41 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19311.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19311/head:pull/19311 PR: https://git.openjdk.org/jdk/pull/19311 From stuefe at openjdk.org Tue May 21 05:23:01 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 21 May 2024 05:23:01 GMT Subject: RFR: 8332538: Switch off JIT memory limit check for TestAlignVectorFuzzer.java In-Reply-To: References: Message-ID: On Mon, 20 May 2024 18:36:08 GMT, Vladimir Kozlov wrote: > Add flag `-XX:CompileCommand=MemLimit,*.*,0` to TestAlignVectorFuzzer.java test until [JDK-8332537](https://bugs.openjdk.org/browse/JDK-8332537) is fixed. > > Tested: tier1 Good and trivial ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19316#pullrequestreview-2067544570 From thartmann at openjdk.org Tue May 21 05:26:08 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 21 May 2024 05:26:08 GMT Subject: RFR: 8328181: C2: assert(MaxVectorSize >= 32) failed: vector length should be >= 32 [v2] In-Reply-To: References: Message-ID: <-QcJMgC_7JqcWcmeY2MaB9Mh7Yq7f13q5KhyHGOC4yc=.f49a8233-4046-4b12-92ca-0d402717c513@github.com> On Mon, 8 Apr 2024 02:35:33 GMT, Jatin Bhateja wrote: >> This bug fix patch tightens the predication check for small constant length clear array pattern and relaxes associated feature checks. Modified few comments for clarity. >> >> Kindly review and approve. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Cleanup predicates. This introduced a performance regression, see [JDK-8332487](https://bugs.openjdk.org/browse/JDK-8332487). @jatin-bhateja, could you please have a look? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18464#issuecomment-2121764497 From epeter at openjdk.org Tue May 21 05:58:07 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 May 2024 05:58:07 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v23] In-Reply-To: References: Message-ID: <_Mzv-09FfizrCCeMYlVSCPjMmO5l9zeBbkuw7H_R2e0=.e56a7441-c624-4c76-8706-508a7da84895@github.com> On Mon, 20 May 2024 14:57:47 GMT, Steve Dohrmann wrote: >> I have few comments. > > Hi @vnkozlov, > > Is there anything else you would like to see for this PR. If not, I would like to check it in this week. @steveatgh I just saw that we from Oracle did not run any tests for this. Can you please hold off for a day or two until we have the testing completed? I'm sure you did exhaustive testing - but still I'd like to make sure it runs fine on all the x64 machines we have. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2121803983 From epeter at openjdk.org Tue May 21 06:04:05 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 May 2024 06:04:05 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v23] In-Reply-To: References: Message-ID: On Mon, 20 May 2024 14:57:47 GMT, Steve Dohrmann wrote: >> I have few comments. > > Hi @vnkozlov, > > Is there anything else you would like to see for this PR. If not, I would like to check it in this week. @steveatgh I think it could make sense to add a simple "hello world" JTREG test that enables the `UseAPX` flag, just to test if it is handled correctly, even on platforms that do not have the feature enabled. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2121810100 From thartmann at openjdk.org Tue May 21 06:08:02 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 21 May 2024 06:08:02 GMT Subject: RFR: 8332538: Switch off JIT memory limit check for TestAlignVectorFuzzer.java In-Reply-To: References: Message-ID: <_u37R2_BCx4ptmwyHsXFmUevJLmygF-VOCJykU34F5I=.67bf65d6-21a7-4c40-b5c8-d991e8d32e57@github.com> On Mon, 20 May 2024 18:36:08 GMT, Vladimir Kozlov wrote: > Add flag `-XX:CompileCommand=MemLimit,*.*,0` to TestAlignVectorFuzzer.java test until [JDK-8332537](https://bugs.openjdk.org/browse/JDK-8332537) is fixed. > > Tested: tier1 Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19316#pullrequestreview-2067614285 From epeter at openjdk.org Tue May 21 06:11:06 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 May 2024 06:11:06 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v23] In-Reply-To: References: Message-ID: <9CD9vJMHAwob0CCW0697u1CAsCa3WZiJ2FEKDW0tc10=.fb45c38c-dd0f-43b1-a0a4-68963cca8391@github.com> On Fri, 17 May 2024 17:52:31 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > added comment about UseAPX and UseAVX > 2 correspondence Changes requested by epeter (Reviewer). src/hotspot/cpu/x86/assembler_x86.cpp line 1668: > 1666: void Assembler::andnl(Register dst, Register src1, Register src2) { > 1667: assert(VM_Version::supports_bmi1(), "bit manipulation instructions not supported"); > 1668: assert(!needs_eevex(dst, src1, src2) || UseAPX, "extended gpr use requires UseAPX and UseAVX > 2"); Technical detail: `UseAPX and UseAVX > 2` sounds wrong. Did you mean to say "or"? Because UseAPX is only enabled when `UseAVX >= 3`. src/hotspot/cpu/x86/assembler_x86.cpp line 2036: > 2034: InstructionMark im(this); > 2035: if (needs_eevex(crc, adr.base(), adr.index())) { > 2036: assert(UseAPX, "extended gpr use requires UseAPX and UseAVX > 2"); Maybe here the "and" makes sense, but not sure. src/hotspot/cpu/x86/vm_version_x86.cpp line 1008: > 1006: if (UseAPX && (UseAVX < 3)) { > 1007: if (!FLAG_IS_DEFAULT(UseAPX)) { > 1008: warning("UseAPX is only available when UseAVX > 2"); Suggestion: warning("UseAPX is only available when UseAVX > 2. Disabling UseAPX."); ------------- PR Review: https://git.openjdk.org/jdk/pull/18476#pullrequestreview-2067611924 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1607690139 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1607691284 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1607694066 From epeter at openjdk.org Tue May 21 06:11:07 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 May 2024 06:11:07 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v23] In-Reply-To: <9CD9vJMHAwob0CCW0697u1CAsCa3WZiJ2FEKDW0tc10=.fb45c38c-dd0f-43b1-a0a4-68963cca8391@github.com> References: <9CD9vJMHAwob0CCW0697u1CAsCa3WZiJ2FEKDW0tc10=.fb45c38c-dd0f-43b1-a0a4-68963cca8391@github.com> Message-ID: <6OhcNl6TmRZcvkUooVTggTv9X-6-nZI1UlF13mfD6q8=.a7a87c80-9479-491c-a71b-7157dc1a1cf6@github.com> On Tue, 21 May 2024 06:07:53 GMT, Emanuel Peter wrote: >> Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: >> >> added comment about UseAPX and UseAVX > 2 correspondence > > src/hotspot/cpu/x86/vm_version_x86.cpp line 1008: > >> 1006: if (UseAPX && (UseAVX < 3)) { >> 1007: if (!FLAG_IS_DEFAULT(UseAPX)) { >> 1008: warning("UseAPX is only available when UseAVX > 2"); > > Suggestion: > > warning("UseAPX is only available when UseAVX > 2. Disabling UseAPX."); This would tell the user what we are doing, just like with the UseAVX flag. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1607694409 From aboldtch at openjdk.org Tue May 21 06:14:05 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 21 May 2024 06:14:05 GMT Subject: RFR: 8332527: ZGC: generalize object cloning logic In-Reply-To: References: Message-ID: On Mon, 20 May 2024 14:31:26 GMT, Roberto Casta?eda Lozano wrote: > This changeset generalize the logic to produce a runtime call to clone a class instance so that it can be shared by other collectors adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). The changeset moves the logic from `ZBarrierSetC2` to the GC-shared `BarrierSetC2` class and adds support for 32-bits platforms. > > #### Testing > > - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). > - tier4-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only). > - `compiler/arraycopy` tests (linux-x86-debug) with [an additional patch](https://github.com/openjdk/jdk/commit/ddcf777894e740b8e6ddbbf8821e82a173c23ef4) that implements cloning of large class instances with a runtime clone call rather than arraycopy when using G1 (to exercise the generalized logic on a 32-bits platform). lgtm. Feel free to use, change or discard my suggestions. src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 840: > 838: void BarrierSetC2::clone_instance_in_runtime(PhaseMacroExpand* phase, ArrayCopyNode* ac, > 839: address clone_addr, const char* clone_name) const { > 840: assert(ac->is_clone_inst(), "this function is only defined for cloning class instances"); Saying `class instances` is confusing to me. This is used for all instance objects, not only instances of Class objects. Maybe this is some terminology I am unfamiliar with, but in general hotspot uses instance vs array to distinguish between the two classes of objects. E.g. `instanceOop vs arrayOop`, `InstanceKlass vs ArrayKlass`. Suggestion: assert(ac->is_clone_inst(), "this function is only defined for cloning instances"); src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 852: > 850: // The native clone we are calling here expects the instance size in words. > 851: // Add header/offset size to payload size to get instance size. > 852: Node* const base_offset = phase->MakeConX(arraycopy_payload_base_offset(ac->is_clone_array()) >> LogBytesPerLong); Why query for `is_clone_array()` when it is known false from the context we are in? (I know it is what the previous code did, but I am curious why it would be preferred.) Suggestion: Node* const base_offset = phase->MakeConX(arraycopy_payload_base_offset(false /* is_array */) >> LogBytesPerLong); ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19311#pullrequestreview-2067611931 PR Review Comment: https://git.openjdk.org/jdk/pull/19311#discussion_r1607690156 PR Review Comment: https://git.openjdk.org/jdk/pull/19311#discussion_r1607690251 From epeter at openjdk.org Tue May 21 06:15:06 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 May 2024 06:15:06 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v23] In-Reply-To: References: Message-ID: <6Ob1gJGLun-R2zHKWUvE6BJj1J01JYdC3VYn1Dt9bTw=.2bd102b8-7c18-4083-9be9-60bcce43c8a9@github.com> On Tue, 21 May 2024 06:00:55 GMT, Emanuel Peter wrote: >> Hi @vnkozlov, >> >> Is there anything else you would like to see for this PR. If not, I would like to check it in this week. > > @steveatgh I think it could make sense to add a simple "hello world" JTREG test that enables the `UseAPX` flag, just to test if it is handled correctly, even on platforms that do not have the feature enabled. > Thank you @eme64 for the comments. The functionality of the UseAPX flag is, as you point out, incomplete in this pull request. A subsequent PR (see JDK-8329030) will tie the logic of the flag in with a query of the hardware features. It was added in this PR thinking it could be useful for testing or debugging the encoding functionality. Wait. Does this mean that if I enable the `UseAPX` flag on my `AVX512` machine with `UseAVX=3`, that we will start encoding instructions using APX? Can that lead to wrong results? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2121824570 From epeter at openjdk.org Tue May 21 06:45:09 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 May 2024 06:45:09 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v4] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 21:05:55 GMT, Damon Fenacci wrote: > It looks as if load/stores that use from/intoMemorySegment with different types apparently don?t create LoadVector nodes. It seems that fromMemorySegment tries to inline the VectorSupport::load intrinsic, but fails as the type of the vector and the inferred type of the underlying memory segment differ: Ha, that seems to be a bit of an arbitrary (maybe just conservative) restriction. I hope we can lift that in the future. We do not have such restrictions for scalar array load/store. And we can also Auto-Vectorize those. Would be interesting to ask some VectorAPI folks about that. Looking at the tests now! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18347#issuecomment-2121864184 From dfenacci at openjdk.org Tue May 21 06:49:23 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 21 May 2024 06:49:23 GMT Subject: RFR: 8326615: C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) Message-ID: # Issue The test `compiler/startup/StartupOutput.java` fails intermittently due to a crash after correctly printing the error `Initial size of CodeCache is too small` (the test limits the code cache using k-XX:InitialCodeCacheSize=1024K -XX:ReservedCodeCacheSize=1200k`). The appearance of the issue is very dependent on thread scheduling. The original report happens during C1 initialization but C2 initialization is affected as well. # Causes There is one occurrence during C1 initialization and one during C2 initialization where a call to `RuntimeStub::new_runtime_stub` can fail fatally if there is not enough space left. For C1: `Compiler::init_c1_runtime` -> `Runtime1::initialize` -> `Runtime1::generate_blob_for` -> `Runtime1::generate_blob` -> `RuntimeStub::new_runtime_stub`. For C2: `C2Compiler::initialize` -> `OptoRuntime::generate` -> `OptoRuntime::generate_stub` -> `Compile::Compile` -> `Compile::Code_Gen` -> `PhaseOutput::install` -> `PhaseOutput::install_stub` -> `RuntimeStub::new_runtime_stub`. # Solution https://github.com/openjdk/jdk/pull/15970 introduced an optional argument to `RuntimeStub::new_runtime_stub` to determine if it fails fatally or not. We can take advantage of it to avoid crashing and instead pass the information about the success or failure of the allocation up the (C1 and C2 initialization) call stack up to where we can set the compilations as failed. ------------- Commit messages: - Merge branch 'master' into JDK-8326615 - JDK-8326615: update copyright year - JDK-8326615: compiler/startup/StartupOutput.java intermittently Internal Error (codeBlob.cpp:429) Initial size of CodeCache is too small Changes: https://git.openjdk.org/jdk/pull/19280/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19280&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8326615 Stats: 29 lines in 6 files changed: 8 ins; 3 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/19280.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19280/head:pull/19280 PR: https://git.openjdk.org/jdk/pull/19280 From epeter at openjdk.org Tue May 21 06:57:04 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 May 2024 06:57:04 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v11] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 20:38:56 GMT, Damon Fenacci wrote: >> # Issue >> When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. >> >> # Causes >> On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. >> The same is true for `StoreVector`s. >> When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 >> >> where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor offsets interfere. >> Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 >> >> but we don?t make sure that there are no masks or offsets. >> A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and offsets here too but in this case we can include these cases if the masks and offsets of the vector stores are equivalent. >> >> # Solution >> To avoid folding `Load`- and `StoreVector`s with masks and offsets we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns its own (to avoid ever being the same as a store node). In this way, the checks in `MemNode::can_see_stored_value` >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 >> >> and `StoreNode::Identity` >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 >> >> will fail if masks or offsets are used. >> For 2 stores of the same value we instead check for mask and offset equality. >> >> Regression tests for... > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8325520: update match condition Nice, the tests look good, thanks for adding them! src/hotspot/share/opto/vectornode.hpp line 979: > 977: idx == MemNode::ValueIn || > 978: idx == MemNode::ValueIn + 1; } > 979: virtual Node* offsets() const { return in(Offsets); } Would be nice to add some `override` keywords here ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18347#issuecomment-2121881444 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1607733531 From epeter at openjdk.org Tue May 21 06:57:05 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 May 2024 06:57:05 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v6] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 21:00:08 GMT, Damon Fenacci wrote: >> I agree that it is quite convoluted probably also because I've put `if (!is_StoreVector())` (which is redundant) at the beginning to get the most common case out of the way but still... >> At first I thought that multiple inheritance would be a good solution (masks and offsets could be inherited by the corresponding nodes) but the "HotSpot Coding Style" clearly says to avoid it... >> So, I think in the end your second suggestion is the cleanest. Changing it... > > I've updated it. The condition unfortunately doesn't look as clean as the one above as we need to check for `nullptr` (either both or none and `eqv_uncast`). I've tried to make it as concise as possible (we could have made `mask` and `offsets` return a _unique_ node instead, so as to avoid the `nullptr`, but I had the impression it would just make everything less clear). I think this is good, certainly much better than what we had before! ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1607732682 From epeter at openjdk.org Tue May 21 06:57:06 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 May 2024 06:57:06 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v11] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 06:48:03 GMT, Emanuel Peter wrote: >> Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8325520: update match condition > > src/hotspot/share/opto/vectornode.hpp line 979: > >> 977: idx == MemNode::ValueIn || >> 978: idx == MemNode::ValueIn + 1; } >> 979: virtual Node* offsets() const { return in(Offsets); } > > Would be nice to add some `override` keywords here ;) And also below. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1607733685 From thartmann at openjdk.org Tue May 21 07:29:02 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 21 May 2024 07:29:02 GMT Subject: RFR: 8332369: C2: assert(false) failed: graph should be schedulable after JDK-8324517 In-Reply-To: References: Message-ID: On Fri, 17 May 2024 07:50:08 GMT, Roland Westrelin wrote: > The issue occurs when a `Mod` node is processed during > final_graph_reshaping: if a `Div` node is found with the same inputs, > the `Mod` is replaced either by a `DivMod` node or a subgraph that has > the `Div` node as input. Finding the `Div` node is done > `find_similar()` which ignores the precedence edges. What happens is > that the `Div` node returned by `find_similar()` could have a > precedence edge that pins it at a control that doesn't dominate the > control of some of the uses of the `Mod` node. > > The fix I propose is to simply not perfom the transformation if one of > the nodes has precedence edges (which should be a rare corner case). Looks good to me too. Christian's testing passed. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19277#pullrequestreview-2067756361 From thartmann at openjdk.org Tue May 21 07:32:02 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 21 May 2024 07:32:02 GMT Subject: RFR: 8332462: ubsan: c1_ValueStack.hpp:229:49: runtime error: load of value 171, which is not a valid value for type 'bool' In-Reply-To: References: Message-ID: On Fri, 17 May 2024 13:48:57 GMT, Matthias Baesken wrote: > This coding, with ubsan enabled > bool force_reexecute() const { return _force_reexecute; } > > gives us on Linux x86_64 fastdebug the following warning : > > /jdk/src/hotspot/share/c1/c1_ValueStack.hpp:229:49: runtime error: load of value 171, which is not a valid value for type 'bool' > #0 0x14b3999f2921 in ValueStack::force_reexecute() const /jdk/src/hotspot/share/c1/c1_ValueStack.hpp:229 > #1 0x14b3999f2921 in LIRGenerator::do_ArrayCopy(Intrinsic*) /jdk/src/hotspot/cpu/x86/c1_LIRGenerator_x86.cpp:1008 > #2 0x14b39aa1c077 in LIRGenerator::do_root(Instruction*) /jdk/src/hotspot/share/c1/c1_LIRGenerator.cpp:379 > #3 0x14b39aa2df94 in non-virtual thunk to LIRGenerator::block_do(BlockBegin*) (/net/usr.work/d040975/open_jdk/jdk_6/build_clx209_fastdebug/jdk/lib/server/libjvm.so+0x5ad1f94) > #4 0x14b39a971ff6 in BlockList::iterate_forward(BlockClosure*) /jdk/src/hotspot/share/c1/c1_Instruction.cpp:891 > #5 0x14b39a878114 in Compilation::emit_lir() /jdk/src/hotspot/share/c1/c1_Compilation.cpp:264 > #6 0x14b39a882076 in Compilation::compile_java_method() /jdk/src/hotspot/share/c1/c1_Compilation.cpp:407 > #7 0x14b39a884c48 in Compilation::compile_method() /jdk/src/hotspot/share/c1/c1_Compilation.cpp:479 > #8 0x14b39a88681a in Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, BufferBlob*, bool, DirectiveSet*) /jdk/src/hotspot/share/c1/c1_Compilation.cpp:609 > #9 0x14b39a88bd63 in Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) /jdk/src/hotspot/share/c1/c1_Compiler.cpp:260 > #10 0x14b39b153241 in CompileBroker::invoke_compiler_on_method(CompileTask*) /jdk/src/hotspot/share/compiler/compileBroker.cpp:2303 > #11 0x14b39b154d3e in CompileBroker::compiler_thread_loop() /jdk/src/hotspot/share/compiler/compileBroker.cpp:1961 > #12 0x14b39bdb17bc in JavaThread::thread_main_inner() /jdk/src/hotspot/share/runtime/javaThread.cpp:759 > #13 0x14b39d8a828f in Thread::call_run() /jdk/src/hotspot/share/runtime/thread.cpp:225 > ... (rest of output omitted) > > Seems we miss initializations of the variable _force_reexecute , and this can lead to arbitrary values at the address in memory where _force_reexecute is stored. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19284#pullrequestreview-2067762983 From mbaesken at openjdk.org Tue May 21 07:38:08 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 21 May 2024 07:38:08 GMT Subject: Integrated: 8332462: ubsan: c1_ValueStack.hpp:229:49: runtime error: load of value 171, which is not a valid value for type 'bool' In-Reply-To: References: Message-ID: On Fri, 17 May 2024 13:48:57 GMT, Matthias Baesken wrote: > This coding, with ubsan enabled > bool force_reexecute() const { return _force_reexecute; } > > gives us on Linux x86_64 fastdebug the following warning : > > /jdk/src/hotspot/share/c1/c1_ValueStack.hpp:229:49: runtime error: load of value 171, which is not a valid value for type 'bool' > #0 0x14b3999f2921 in ValueStack::force_reexecute() const /jdk/src/hotspot/share/c1/c1_ValueStack.hpp:229 > #1 0x14b3999f2921 in LIRGenerator::do_ArrayCopy(Intrinsic*) /jdk/src/hotspot/cpu/x86/c1_LIRGenerator_x86.cpp:1008 > #2 0x14b39aa1c077 in LIRGenerator::do_root(Instruction*) /jdk/src/hotspot/share/c1/c1_LIRGenerator.cpp:379 > #3 0x14b39aa2df94 in non-virtual thunk to LIRGenerator::block_do(BlockBegin*) (/net/usr.work/d040975/open_jdk/jdk_6/build_clx209_fastdebug/jdk/lib/server/libjvm.so+0x5ad1f94) > #4 0x14b39a971ff6 in BlockList::iterate_forward(BlockClosure*) /jdk/src/hotspot/share/c1/c1_Instruction.cpp:891 > #5 0x14b39a878114 in Compilation::emit_lir() /jdk/src/hotspot/share/c1/c1_Compilation.cpp:264 > #6 0x14b39a882076 in Compilation::compile_java_method() /jdk/src/hotspot/share/c1/c1_Compilation.cpp:407 > #7 0x14b39a884c48 in Compilation::compile_method() /jdk/src/hotspot/share/c1/c1_Compilation.cpp:479 > #8 0x14b39a88681a in Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, BufferBlob*, bool, DirectiveSet*) /jdk/src/hotspot/share/c1/c1_Compilation.cpp:609 > #9 0x14b39a88bd63 in Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) /jdk/src/hotspot/share/c1/c1_Compiler.cpp:260 > #10 0x14b39b153241 in CompileBroker::invoke_compiler_on_method(CompileTask*) /jdk/src/hotspot/share/compiler/compileBroker.cpp:2303 > #11 0x14b39b154d3e in CompileBroker::compiler_thread_loop() /jdk/src/hotspot/share/compiler/compileBroker.cpp:1961 > #12 0x14b39bdb17bc in JavaThread::thread_main_inner() /jdk/src/hotspot/share/runtime/javaThread.cpp:759 > #13 0x14b39d8a828f in Thread::call_run() /jdk/src/hotspot/share/runtime/thread.cpp:225 > ... (rest of output omitted) > > Seems we miss initializations of the variable _force_reexecute , and this can lead to arbitrary values at the address in memory where _force_reexecute is stored. This pull request has now been integrated. Changeset: 8a49d47c Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/8a49d47cf3e845ddccaaeafeee9dfe6ab3180ded Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod 8332462: ubsan: c1_ValueStack.hpp:229:49: runtime error: load of value 171, which is not a valid value for type 'bool' Reviewed-by: chagedorn, mdoerr, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/19284 From mli at openjdk.org Tue May 21 07:41:27 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 21 May 2024 07:41:27 GMT Subject: RFR: 8332153: RISC-V: enable tests and add comment for vector shift instruct (shared by vectorization and Vector API) [v4] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > For vector shift instruct, some corresponding tests are not enabled, this is to enable them. > And the way how vector shift instruct works is not clear, especially both vectorization (SLP in jdk) and Vector API share the same instruct's in riscv_v.ad, so also added some comment to clarify it. > > Thanks Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: minor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19265/files - new: https://git.openjdk.org/jdk/pull/19265/files/7216d886..2ba613fa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19265&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19265&range=02-03 Stats: 3 lines in 2 files changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19265.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19265/head:pull/19265 PR: https://git.openjdk.org/jdk/pull/19265 From mli at openjdk.org Tue May 21 07:41:27 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 21 May 2024 07:41:27 GMT Subject: RFR: 8332153: RISC-V: enable tests and add comment for vector shift instruct (shared by vectorization and Vector API) [v3] In-Reply-To: References: Message-ID: <2UEmWZ2D9zIm7zFwG6FVkW9dyfIhSaLIsI9dVQ_NiKg=.b895bac1-cf3e-4a05-a68a-06b20876a2cd@github.com> On Tue, 21 May 2024 01:57:00 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> fix misc > > test/hotspot/jtreg/compiler/c2/aarch64/TestVectorShiftShorts.java line 31: > >> 29: * >> 30: * @requires vm.compiler2.enabled >> 31: * @requires os.arch == "aarch64" > > Is this change still needed then? No, fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19265#discussion_r1607794042 From gcao at openjdk.org Tue May 21 07:46:08 2024 From: gcao at openjdk.org (Gui Cao) Date: Tue, 21 May 2024 07:46:08 GMT Subject: RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v7] In-Reply-To: References: Message-ID: <6DenkzTvXYiIBKrR7D7JbkNG53AgOuwkvH3_41-_qWI=.356e8a00-49f7-48cf-9464-fffd01ce5f37@github.com> On Fri, 17 May 2024 13:44:18 GMT, Gui Cao wrote: >> ArsenyBochkarev has updated the pull request incrementally with eight additional commits since the last revision: >> >> - Prettify L_nmax loop >> - Add comments in functions >> - Add explanation comment for L_nmax_loop >> - Fix L_nmax_loop for big lengths >> - Fix L_by16 loop step >> - Prettify intrinsic >> - Use LMUL=4 for most of the calculations >> - Use LMUL to load multiple data in one step > > I also ran the correctness test on the Banana Pi BPI-F3 board (has RVV1.0): > > Before this patch and disable UseRVV: > Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok > Before this patch and enable UseRVV: > Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok > > Apply this patch and disable UseRVV: > Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok > Apply this patch and enable UseRVV: > Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is Failed > > The TestAdler32.jtr on Failed is as follows: > [TestAdler32.jtr.log](https://github.com/openjdk/jdk/files/15350178/TestAdler32.jtr.log) > Hello @zifeihan! Thanks for your efforts on improving this PR. I don't have access (yet) to Banana Pi board, so I can't debug precisely the case you pointed out. However, I know that vlen for Banana Pi is 256 bit, so I fixed problems for this case and checked functional correctness on QEMU for both 128 and 256 bit, which is OK now. Could you please do a re-run of `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` test? Sorry for being late, JMH performance test data just finished. Apply this pr and enable UseRVV: - [x] Correctness test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` passed normally. - [x] JMH test `test/micro/org/openjdk/bench/java/util/TestAdler32.java` passed normally. Benchmark (count) Mode Cnt Score Error Units TestAdler32.testAdler32Update 64 thrpt 25 7865.764 ? 57.876 ops/ms TestAdler32.testAdler32Update 128 thrpt 25 6361.346 ? 0.178 ops/ms TestAdler32.testAdler32Update 256 thrpt 25 4595.217 ? 0.166 ops/ms TestAdler32.testAdler32Update 512 thrpt 25 2941.284 ? 12.318 ops/ms TestAdler32.testAdler32Update 1024 thrpt 25 1728.568 ? 0.053 ops/ms TestAdler32.testAdler32Update 2048 thrpt 25 943.173 ? 1.043 ops/ms TestAdler32.testAdler32Update 5012 thrpt 25 404.343 ? 0.205 ops/ms TestAdler32.testAdler32Update 8192 thrpt 25 249.495 ? 1.986 ops/ms TestAdler32.testAdler32Update 16384 thrpt 25 126.168 ? 1.261 ops/ms TestAdler32.testAdler32Update 32768 thrpt 25 61.925 ? 0.607 ops/ms TestAdler32.testAdler32Update 65536 thrpt 25 30.866 ? 0.375 ops/ms Finished running test 'micro:java.util.TestAdler32' ------------- PR Comment: https://git.openjdk.org/jdk/pull/18382#issuecomment-2121969207 From thartmann at openjdk.org Tue May 21 07:50:02 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 21 May 2024 07:50:02 GMT Subject: RFR: 8332498: [aarch64, x86] improving OpToAssembly output for partialSubtypeCheckConstSuper Instruct In-Reply-To: References: Message-ID: On Sun, 19 May 2024 15:36:15 GMT, Amit Kumar wrote: > format method generated previously: > > void partialSubtypeCheckConstSuperNode::format(PhaseRegAlloc *ra, outputStream *st) const { > // Start at oper_input_base() and count operands > unsigned idx0 = 1; > unsigned idx1 = 1; // sub > unsigned idx2 = idx1 + opnd_array(1)->num_edges(); // super_reg > unsigned idx3 = idx2 + opnd_array(2)->num_edges(); // super_con > unsigned idx4 = idx3 + opnd_array(3)->num_edges(); // vtemp > unsigned idx5 = idx4 + opnd_array(4)->num_edges(); // tempR1 > unsigned idx6 = idx5 + opnd_array(5)->num_edges(); // tempR2 > unsigned idx7 = idx6 + opnd_array(6)->num_edges(); // tempR3 > st->print_raw("partialSubtypeCheck "); > opnd_array(0)->int_format(ra, this, st); // result > st->print_raw(", "); > opnd_array(1)->ext_format(ra, this,idx1, st); // sub > st->print_raw(", super"); > } > > > format method generated with this change: > > void partialSubtypeCheckConstSuperNode::format(PhaseRegAlloc *ra, outputStream *st) const { > // Start at oper_input_base() and count operands > unsigned idx0 = 1; > unsigned idx1 = 1; // sub > unsigned idx2 = idx1 + opnd_array(1)->num_edges(); // super_reg > unsigned idx3 = idx2 + opnd_array(2)->num_edges(); // super_con > unsigned idx4 = idx3 + opnd_array(3)->num_edges(); // vtemp > unsigned idx5 = idx4 + opnd_array(4)->num_edges(); // tempR1 > unsigned idx6 = idx5 + opnd_array(5)->num_edges(); // tempR2 > unsigned idx7 = idx6 + opnd_array(6)->num_edges(); // tempR3 > st->print_raw("partialSubtypeCheck "); > opnd_array(0)->int_format(ra, this, st); // result > st->print_raw(", "); > opnd_array(1)->ext_format(ra, this,idx1, st); // sub > st->print_raw(", "); > opnd_array(2)->ext_format(ra, this,idx2, st); // super_reg > st->print_raw(", "); > opnd_array(3)->ext_format(ra, this,idx3, st); // super_con > } Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19295#pullrequestreview-2067802213 From fyang at openjdk.org Tue May 21 07:58:03 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 21 May 2024 07:58:03 GMT Subject: RFR: 8332153: RISC-V: enable tests and add comment for vector shift instruct (shared by vectorization and Vector API) [v4] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 07:41:27 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> For vector shift instruct, some corresponding tests are not enabled, this is to enable them. >> And the way how vector shift instruct works is not clear, especially both vectorization (SLP in jdk) and Vector API share the same instruct's in riscv_v.ad, so also added some comment to clarify it. >> >> Thanks > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > minor LGTM assuming these tests pass with RVV. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19265#pullrequestreview-2067817780 From fyang at openjdk.org Tue May 21 08:04:03 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 21 May 2024 08:04:03 GMT Subject: RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v7] In-Reply-To: References: Message-ID: On Mon, 20 May 2024 22:38:20 GMT, ArsenyBochkarev wrote: >> I also ran the correctness test on the Banana Pi BPI-F3 board (has RVV1.0): >> >> Before this patch and disable UseRVV: >> Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok >> Before this patch and enable UseRVV: >> Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok >> >> Apply this patch and disable UseRVV: >> Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok >> Apply this patch and enable UseRVV: >> Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is Failed >> >> The TestAdler32.jtr on Failed is as follows: >> [TestAdler32.jtr.log](https://github.com/openjdk/jdk/files/15350178/TestAdler32.jtr.log) > > Hello @zifeihan! Thanks for your efforts on improving this PR. I don't have access (yet) to Banana Pi board, so I can't debug precisely the case you pointed out. However, I know that vlen for Banana Pi is 256 bit, so I fixed problems for this case and checked functional correctness on QEMU for both 128 and 256 bit, which is OK now. Could you please do a re-run of `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` test? @ArsenyBochkarev: Hi, Will take another look later this week. It seems that the JMH numbers on Banana Pi improved with your last 3 commits. Is that anticipated? BTW: You need to fix errors reported by jcheck. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18382#issuecomment-2122000787 From epeter at openjdk.org Tue May 21 08:09:06 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 May 2024 08:09:06 GMT Subject: RFR: 8332394: Add friendly output when @IR rule missing value [v2] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 06:32:41 GMT, Hamlin Li wrote: >> Good catch! Only a small improvement suggestion, otherwise, looks good. >> >> Just noticed that we are actually missing tests that trigger a format violation in `TestBadFormat` for `applyIfCPUFeature*` and `applyIfPlatform*`. We should probably add some at some point, analogously to the ones already there for `applyIf*` for flags. But that could be done separately. > > Thanks @chhagedorn for your reviewing! @Hamlin-Li you only got 1 review. Per the rules, you generally need 2: https://openjdk.org/guide/#final-check-before-creating-the-pr That is unless you say that the change is **trivial**, and the reviewer also confirms that it is **trivial**. I don't see that here. Our rule is that you need 2 reviewers: at least one reviewer, the second one can be a committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19270#issuecomment-2122017305 From amitkumar at openjdk.org Tue May 21 08:20:07 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 21 May 2024 08:20:07 GMT Subject: RFR: 8332498: [aarch64, x86] improving OpToAssembly output for partialSubtypeCheckConstSuper Instruct In-Reply-To: References: Message-ID: On Sun, 19 May 2024 15:36:15 GMT, Amit Kumar wrote: > format method generated previously: > > void partialSubtypeCheckConstSuperNode::format(PhaseRegAlloc *ra, outputStream *st) const { > // Start at oper_input_base() and count operands > unsigned idx0 = 1; > unsigned idx1 = 1; // sub > unsigned idx2 = idx1 + opnd_array(1)->num_edges(); // super_reg > unsigned idx3 = idx2 + opnd_array(2)->num_edges(); // super_con > unsigned idx4 = idx3 + opnd_array(3)->num_edges(); // vtemp > unsigned idx5 = idx4 + opnd_array(4)->num_edges(); // tempR1 > unsigned idx6 = idx5 + opnd_array(5)->num_edges(); // tempR2 > unsigned idx7 = idx6 + opnd_array(6)->num_edges(); // tempR3 > st->print_raw("partialSubtypeCheck "); > opnd_array(0)->int_format(ra, this, st); // result > st->print_raw(", "); > opnd_array(1)->ext_format(ra, this,idx1, st); // sub > st->print_raw(", super"); > } > > > format method generated with this change: > > void partialSubtypeCheckConstSuperNode::format(PhaseRegAlloc *ra, outputStream *st) const { > // Start at oper_input_base() and count operands > unsigned idx0 = 1; > unsigned idx1 = 1; // sub > unsigned idx2 = idx1 + opnd_array(1)->num_edges(); // super_reg > unsigned idx3 = idx2 + opnd_array(2)->num_edges(); // super_con > unsigned idx4 = idx3 + opnd_array(3)->num_edges(); // vtemp > unsigned idx5 = idx4 + opnd_array(4)->num_edges(); // tempR1 > unsigned idx6 = idx5 + opnd_array(5)->num_edges(); // tempR2 > unsigned idx7 = idx6 + opnd_array(6)->num_edges(); // tempR3 > st->print_raw("partialSubtypeCheck "); > opnd_array(0)->int_format(ra, this, st); // result > st->print_raw(", "); > opnd_array(1)->ext_format(ra, this,idx1, st); // sub > st->print_raw(", "); > opnd_array(2)->ext_format(ra, this,idx2, st); // super_reg > st->print_raw(", "); > opnd_array(3)->ext_format(ra, this,idx3, st); // super_con > } Thanks Tobias & Vladimir for review. I guess we are good to go now as we have 2 reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19295#issuecomment-2122036730 From amitkumar at openjdk.org Tue May 21 08:20:08 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 21 May 2024 08:20:08 GMT Subject: Integrated: 8332498: [aarch64, x86] improving OpToAssembly output for partialSubtypeCheckConstSuper Instruct In-Reply-To: References: Message-ID: On Sun, 19 May 2024 15:36:15 GMT, Amit Kumar wrote: > format method generated previously: > > void partialSubtypeCheckConstSuperNode::format(PhaseRegAlloc *ra, outputStream *st) const { > // Start at oper_input_base() and count operands > unsigned idx0 = 1; > unsigned idx1 = 1; // sub > unsigned idx2 = idx1 + opnd_array(1)->num_edges(); // super_reg > unsigned idx3 = idx2 + opnd_array(2)->num_edges(); // super_con > unsigned idx4 = idx3 + opnd_array(3)->num_edges(); // vtemp > unsigned idx5 = idx4 + opnd_array(4)->num_edges(); // tempR1 > unsigned idx6 = idx5 + opnd_array(5)->num_edges(); // tempR2 > unsigned idx7 = idx6 + opnd_array(6)->num_edges(); // tempR3 > st->print_raw("partialSubtypeCheck "); > opnd_array(0)->int_format(ra, this, st); // result > st->print_raw(", "); > opnd_array(1)->ext_format(ra, this,idx1, st); // sub > st->print_raw(", super"); > } > > > format method generated with this change: > > void partialSubtypeCheckConstSuperNode::format(PhaseRegAlloc *ra, outputStream *st) const { > // Start at oper_input_base() and count operands > unsigned idx0 = 1; > unsigned idx1 = 1; // sub > unsigned idx2 = idx1 + opnd_array(1)->num_edges(); // super_reg > unsigned idx3 = idx2 + opnd_array(2)->num_edges(); // super_con > unsigned idx4 = idx3 + opnd_array(3)->num_edges(); // vtemp > unsigned idx5 = idx4 + opnd_array(4)->num_edges(); // tempR1 > unsigned idx6 = idx5 + opnd_array(5)->num_edges(); // tempR2 > unsigned idx7 = idx6 + opnd_array(6)->num_edges(); // tempR3 > st->print_raw("partialSubtypeCheck "); > opnd_array(0)->int_format(ra, this, st); // result > st->print_raw(", "); > opnd_array(1)->ext_format(ra, this,idx1, st); // sub > st->print_raw(", "); > opnd_array(2)->ext_format(ra, this,idx2, st); // super_reg > st->print_raw(", "); > opnd_array(3)->ext_format(ra, this,idx3, st); // super_con > } This pull request has now been integrated. Changeset: 7ffc9997 Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/7ffc9997bd4a93cefe30f672a5f0e9c49215d2c7 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod 8332498: [aarch64, x86] improving OpToAssembly output for partialSubtypeCheckConstSuper Instruct Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/19295 From thartmann at openjdk.org Tue May 21 08:21:02 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 21 May 2024 08:21:02 GMT Subject: RFR: 8326615: C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) In-Reply-To: References: Message-ID: On Fri, 17 May 2024 09:37:01 GMT, Damon Fenacci wrote: > # Issue > > The test `compiler/startup/StartupOutput.java` fails intermittently due to a crash after correctly printing the error `Initial size of CodeCache is too small` (the test limits the code cache using k-XX:InitialCodeCacheSize=1024K -XX:ReservedCodeCacheSize=1200k`). > The appearance of the issue is very dependent on thread scheduling. The original report happens during C1 initialization but C2 initialization is affected as well. > > # Causes > > There is one occurrence during C1 initialization and one during C2 initialization where a call to `RuntimeStub::new_runtime_stub` can fail fatally if there is not enough space left. > For C1: `Compiler::init_c1_runtime` -> `Runtime1::initialize` -> `Runtime1::generate_blob_for` -> `Runtime1::generate_blob` -> `RuntimeStub::new_runtime_stub`. > For C2: `C2Compiler::initialize` -> `OptoRuntime::generate` -> `OptoRuntime::generate_stub` -> `Compile::Compile` -> `Compile::Code_Gen` -> `PhaseOutput::install` -> `PhaseOutput::install_stub` -> `RuntimeStub::new_runtime_stub`. > > # Solution > > https://github.com/openjdk/jdk/pull/15970 introduced an optional argument to `RuntimeStub::new_runtime_stub` to determine if it fails fatally or not. We can take advantage of it to avoid crashing and instead pass the information about the success or failure of the allocation up the (C1 and C2 initialization) call stack up to where we can set the compilations as failed. This is not a regression in JDK 23, right? Could you please adjust the affects versions in JIRA accordingly? Looks good to me otherwise. src/hotspot/share/c1/c1_Compiler.cpp line 53: > 51: bool Compiler::init_c1_runtime() { > 52: BufferBlob* buffer_blob = CompilerThread::current()->get_buffer_blob(); > 53: if (!Runtime1::initialize(buffer_blob)) return false; Suggestion: if (!Runtime1::initialize(buffer_blob)) { return false; } src/hotspot/share/c1/c1_Runtime1.cpp line 270: > 268: // generate stubs > 269: for (int id = 0; id < number_of_ids; id++) { > 270: if (!generate_blob_for(blob, (StubID) id)) return false; Suggestion: if (!generate_blob_for(blob, (StubID) id)) { return false; } ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19280#pullrequestreview-2067885932 PR Review Comment: https://git.openjdk.org/jdk/pull/19280#discussion_r1607857623 PR Review Comment: https://git.openjdk.org/jdk/pull/19280#discussion_r1607858281 From jvernee at openjdk.org Tue May 21 08:39:03 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 21 May 2024 08:39:03 GMT Subject: RFR: 8332547: Unloaded signature classes in DirectMethodHandles In-Reply-To: References: Message-ID: On Mon, 20 May 2024 21:29:20 GMT, Vladimir Ivanov wrote: > JVM routinely installs loader constraints for unloaded signature classes when method resolution takes place. MethodHandle resolution took a different route and eagerly resolves signature classes instead (see `java.lang.invoke.MemberName$Factory::resolve` and `sun.invoke.util.VerifyAccess::isTypeVisible` for details). > > There's a micro-optimization which bypasses eager resolution for `java.*` classes. The downside is that `java.*` signature classes can show up as unloaded. It manifests as inlining failures during JIT-compilation and may cause severe performance issues. > > Proposed fix removes the aforementioned special case logic during `MethodHandle` resolution. > > In some cases it may slow down `MethodHandle` construction a bit (e.g., when repeatedly constructing `DirectMethodHandle`s with lots of arguments), but `MethodHandle` construction step is not performance critical. > > Testing: hs-tier1 - hs-tier4 Loading classes seems like a side-effect of `isTypeVisible`. I suggest adding a separate pass that explicitly loads all the signature classes in `MemberName$Factory::resolve`. I think that would make the intent clearer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19319#issuecomment-2122077296 From mdoerr at openjdk.org Tue May 21 08:51:06 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 21 May 2024 08:51:06 GMT Subject: RFR: 8331935: Add support for primitive array C1 clone intrinsic in PPC [v3] In-Reply-To: References: Message-ID: On Mon, 20 May 2024 09:07:14 GMT, Varada M wrote: >> https://bugs.openjdk.org/browse/JDK-8302850 port for PPC64 >> >> JMH Benchmark Results >> >> >> Before : >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 114.107 ? 1.337 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 130.492 ? 0.991 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 139.103 ? 1.913 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 321.688 ? 6.033 ns/op >> ArrayClone.byteClone 0 avgt 15 227.602 ? 3.393 ns/op >> ArrayClone.byteClone 10 avgt 15 237.624 ? 2.996 ns/op >> ArrayClone.byteClone 100 avgt 15 239.219 ? 2.835 ns/op >> >> ArrayClone.byteClone 1000 avgt 15 355.571 ? 2.946 ns/op >> ArrayClone.intArraycopy 0 avgt 15 113.275 ? 1.099 ns/op >> ArrayClone.intArraycopy 10 avgt 15 129.763 ? 1.458 ns/op >> ArrayClone.intArraycopy 100 avgt 15 213.327 ? 2.524 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 449.650 ? 7.338 ns/op >> ArrayClone.intClone 0 avgt 15 225.682 ? 3.048 ns/op >> ArrayClone.intClone 10 avgt 15 234.532 ? 2.817 ns/op >> ArrayClone.intClone 100 avgt 15 295.934 ? 4.925 ns/op >> ArrayClone.intClone 1000 avgt 15 573.368 ? 5.739 ns/op >> Finished running test 'micro:java.lang.ArrayClone' >> Test report is stored in build/aix-ppc64-server-release/test-results/micro_java_lang_ArrayClone >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> micro:java.lang.ArrayClone 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> Finished building target 'test' in configuration 'aix-ppc64-server-release' >> >> >> >> >> After: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 113.894 ? 0.993 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 131.455 ? 0.956 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 139.145 ? 3.002 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 315.957 ? 14.591 ns/op >> ArrayClone.byteClone 0 avgt 15 43.753 ? 3.669 ns/op >> ArrayClone.byteClone 10 avgt 15 52.329 ? 1.041 ns/op >> ArrayClone.byteClone 100 avgt 15 127.711 ? 3.938 ns/op >> >> ArrayClone.byteClone 1000 avgt 15 225.937 ? 1.987 ns/op >> Arr... > > Varada M has updated the pull request incrementally with one additional commit since the last revision: > > Add support for primitive array C1 clone intrinsic Looks good and the tests have passed. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19250#pullrequestreview-2067966425 From rcastanedalo at openjdk.org Tue May 21 09:17:16 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 21 May 2024 09:17:16 GMT Subject: RFR: 8332527: ZGC: generalize object cloning logic [v2] In-Reply-To: References: Message-ID: > This changeset generalize the logic to produce a runtime call to clone a class instance so that it can be shared by other collectors adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). The changeset moves the logic from `ZBarrierSetC2` to the GC-shared `BarrierSetC2` class and adds support for 32-bits platforms. > > #### Testing > > - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). > - tier4-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only). > - `compiler/arraycopy` tests (linux-x86-debug) with [an additional patch](https://github.com/openjdk/jdk/commit/ddcf777894e740b8e6ddbbf8821e82a173c23ef4) that implements cloning of large class instances with a runtime clone call rather than arraycopy when using G1 (to exercise the generalized logic on a 32-bits platform). Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Applied Axel's suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19311/files - new: https://git.openjdk.org/jdk/pull/19311/files/01997186..cf85edec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19311&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19311&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19311.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19311/head:pull/19311 PR: https://git.openjdk.org/jdk/pull/19311 From rcastanedalo at openjdk.org Tue May 21 09:17:16 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 21 May 2024 09:17:16 GMT Subject: RFR: 8332527: ZGC: generalize object cloning logic [v2] In-Reply-To: References: Message-ID: <2K4gVmjkyLcVoDjcILS88inG8OJ1vlbh3MfS3PjuIx0=.e3dbaaad-4312-4e5a-bb8d-6a7064fdf1ff@github.com> On Tue, 21 May 2024 06:11:17 GMT, Axel Boldt-Christmas wrote: > lgtm. > > Feel free to use, change or discard my suggestions. Thanks for reviewing Axel! I just applied your suggestions. > src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 840: > >> 838: void BarrierSetC2::clone_instance_in_runtime(PhaseMacroExpand* phase, ArrayCopyNode* ac, >> 839: address clone_addr, const char* clone_name) const { >> 840: assert(ac->is_clone_inst(), "this function is only defined for cloning class instances"); > > Saying `class instances` is confusing to me. This is used for all instance objects, not only instances of Class objects. Maybe this is some terminology I am unfamiliar with, but in general hotspot uses instance vs array to distinguish between the two classes of objects. E.g. `instanceOop vs arrayOop`, `InstanceKlass vs ArrayKlass`. > Suggestion: > > assert(ac->is_clone_inst(), "this function is only defined for cloning instances"); You are right, using just "instances" is more idiomatic, changed. > src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 852: > >> 850: // The native clone we are calling here expects the instance size in words. >> 851: // Add header/offset size to payload size to get instance size. >> 852: Node* const base_offset = phase->MakeConX(arraycopy_payload_base_offset(ac->is_clone_array()) >> LogBytesPerLong); > > Why query for `is_clone_array()` when it is known false from the context we are in? (I know it is what the previous code did, but I am curious why it would be preferred.) > Suggestion: > > Node* const base_offset = phase->MakeConX(arraycopy_payload_base_offset(false /* is_array */) >> LogBytesPerLong); No good reason, just follows from the pre-existing code, I changed it now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19311#issuecomment-2122156087 PR Review Comment: https://git.openjdk.org/jdk/pull/19311#discussion_r1607942604 PR Review Comment: https://git.openjdk.org/jdk/pull/19311#discussion_r1607944721 From dfenacci at openjdk.org Tue May 21 09:21:37 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 21 May 2024 09:21:37 GMT Subject: RFR: 8326615: C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v2] In-Reply-To: References: Message-ID: <6r39P_htGVom5FkgRtjngxwur1_uVG13JS33mBeghPk=.8ce1ed82-cda9-4dae-beb6-18c60024ec18@github.com> > # Issue > > The test `compiler/startup/StartupOutput.java` fails intermittently due to a crash after correctly printing the error `Initial size of CodeCache is too small` (the test limits the code cache using k-XX:InitialCodeCacheSize=1024K -XX:ReservedCodeCacheSize=1200k`). > The appearance of the issue is very dependent on thread scheduling. The original report happens during C1 initialization but C2 initialization is affected as well. > > # Causes > > There is one occurrence during C1 initialization and one during C2 initialization where a call to `RuntimeStub::new_runtime_stub` can fail fatally if there is not enough space left. > For C1: `Compiler::init_c1_runtime` -> `Runtime1::initialize` -> `Runtime1::generate_blob_for` -> `Runtime1::generate_blob` -> `RuntimeStub::new_runtime_stub`. > For C2: `C2Compiler::initialize` -> `OptoRuntime::generate` -> `OptoRuntime::generate_stub` -> `Compile::Compile` -> `Compile::Code_Gen` -> `PhaseOutput::install` -> `PhaseOutput::install_stub` -> `RuntimeStub::new_runtime_stub`. > > # Solution > > https://github.com/openjdk/jdk/pull/15970 introduced an optional argument to `RuntimeStub::new_runtime_stub` to determine if it fails fatally or not. We can take advantage of it to avoid crashing and instead pass the information about the success or failure of the allocation up the (C1 and C2 initialization) call stack up to where we can set the compilations as failed. Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/share/c1/c1_Runtime1.cpp Co-authored-by: Tobias Hartmann - Update src/hotspot/share/c1/c1_Compiler.cpp Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19280/files - new: https://git.openjdk.org/jdk/pull/19280/files/2fa14b54..c505aac5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19280&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19280&range=00-01 Stats: 6 lines in 2 files changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19280.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19280/head:pull/19280 PR: https://git.openjdk.org/jdk/pull/19280 From dfenacci at openjdk.org Tue May 21 09:21:38 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 21 May 2024 09:21:38 GMT Subject: RFR: 8326615: C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v2] In-Reply-To: References: Message-ID: <_MCEjf4umdYnp_vrlrmqNgTOJOrN0OJ-kL3liHe9fT4=.faaab765-8369-464a-bb6f-b3fdaf4b6338@github.com> On Tue, 21 May 2024 08:17:56 GMT, Tobias Hartmann wrote: > This is not a regression in JDK 23, right? Could you please adjust the affects versions in JIRA accordingly? It is not. Fixing the version. Thanks a lot for reviewing @TobiHartmann. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19280#issuecomment-2122164425 From duke at openjdk.org Tue May 21 10:09:06 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Tue, 21 May 2024 10:09:06 GMT Subject: RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v7] In-Reply-To: <6DenkzTvXYiIBKrR7D7JbkNG53AgOuwkvH3_41-_qWI=.356e8a00-49f7-48cf-9464-fffd01ce5f37@github.com> References: <6DenkzTvXYiIBKrR7D7JbkNG53AgOuwkvH3_41-_qWI=.356e8a00-49f7-48cf-9464-fffd01ce5f37@github.com> Message-ID: <-mRwIBXIo93DZ8MOY9zzMk6WJcMyx1cJasg5TKJUglA=.83cbfc45-3a08-4d1a-880b-fe6fb03d8060@github.com> On Tue, 21 May 2024 07:43:21 GMT, Gui Cao wrote: >> I also ran the correctness test on the Banana Pi BPI-F3 board (has RVV1.0): >> >> Before this patch and disable UseRVV: >> Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok >> Before this patch and enable UseRVV: >> Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok >> >> Apply this patch and disable UseRVV: >> Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok >> Apply this patch and enable UseRVV: >> Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is Failed >> >> The TestAdler32.jtr on Failed is as follows: >> [TestAdler32.jtr.log](https://github.com/openjdk/jdk/files/15350178/TestAdler32.jtr.log) > >> Hello @zifeihan! Thanks for your efforts on improving this PR. I don't have access (yet) to Banana Pi board, so I can't debug precisely the case you pointed out. However, I know that vlen for Banana Pi is 256 bit, so I fixed problems for this case and checked functional correctness on QEMU for both 128 and 256 bit, which is OK now. Could you please do a re-run of `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` test? > > Sorry for being late, JMH performance test data just finished. > Apply this pr and enable UseRVV: > - [x] Correctness test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` passed normally. > - [x] JMH test `test/micro/org/openjdk/bench/java/util/TestAdler32.java` passed normally. > > Benchmark (count) Mode Cnt Score Error Units > TestAdler32.testAdler32Update 64 thrpt 25 7865.764 ? 57.876 ops/ms > TestAdler32.testAdler32Update 128 thrpt 25 6361.346 ? 0.178 ops/ms > TestAdler32.testAdler32Update 256 thrpt 25 4595.217 ? 0.166 ops/ms > TestAdler32.testAdler32Update 512 thrpt 25 2941.284 ? 12.318 ops/ms > TestAdler32.testAdler32Update 1024 thrpt 25 1728.568 ? 0.053 ops/ms > TestAdler32.testAdler32Update 2048 thrpt 25 943.173 ? 1.043 ops/ms > TestAdler32.testAdler32Update 5012 thrpt 25 404.343 ? 0.205 ops/ms > TestAdler32.testAdler32Update 8192 thrpt 25 249.495 ? 1.986 ops/ms > TestAdler32.testAdler32Update 16384 thrpt 25 126.168 ? 1.261 ops/ms > TestAdler32.testAdler32Update 32768 thrpt 25 61.925 ? 0.607 ops/ms > TestAdler32.testAdler32Update 65536 thrpt 25 30.866 ? 0.375 ops/ms > Finished running test 'micro:java.util.TestAdler32' @zifeihan: Thank you for these runs! @RealFYang: Hi, thanks! The JMH numbers increase is expected due to [partial unrolling](https://github.com/openjdk/jdk/pull/18382/commits/453c169b04422ec7aec72d4819a95597be2a7e07) of `L_by16` loop ------------- PR Comment: https://git.openjdk.org/jdk/pull/18382#issuecomment-2122260400 From thartmann at openjdk.org Tue May 21 10:10:02 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 21 May 2024 10:10:02 GMT Subject: RFR: 8326615: C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v2] In-Reply-To: <6r39P_htGVom5FkgRtjngxwur1_uVG13JS33mBeghPk=.8ce1ed82-cda9-4dae-beb6-18c60024ec18@github.com> References: <6r39P_htGVom5FkgRtjngxwur1_uVG13JS33mBeghPk=.8ce1ed82-cda9-4dae-beb6-18c60024ec18@github.com> Message-ID: On Tue, 21 May 2024 09:21:37 GMT, Damon Fenacci wrote: >> # Issue >> >> The test `compiler/startup/StartupOutput.java` fails intermittently due to a crash after correctly printing the error `Initial size of CodeCache is too small` (the test limits the code cache using k-XX:InitialCodeCacheSize=1024K -XX:ReservedCodeCacheSize=1200k`). >> The appearance of the issue is very dependent on thread scheduling. The original report happens during C1 initialization but C2 initialization is affected as well. >> >> # Causes >> >> There is one occurrence during C1 initialization and one during C2 initialization where a call to `RuntimeStub::new_runtime_stub` can fail fatally if there is not enough space left. >> For C1: `Compiler::init_c1_runtime` -> `Runtime1::initialize` -> `Runtime1::generate_blob_for` -> `Runtime1::generate_blob` -> `RuntimeStub::new_runtime_stub`. >> For C2: `C2Compiler::initialize` -> `OptoRuntime::generate` -> `OptoRuntime::generate_stub` -> `Compile::Compile` -> `Compile::Code_Gen` -> `PhaseOutput::install` -> `PhaseOutput::install_stub` -> `RuntimeStub::new_runtime_stub`. >> >> # Solution >> >> https://github.com/openjdk/jdk/pull/15970 introduced an optional argument to `RuntimeStub::new_runtime_stub` to determine if it fails fatally or not. We can take advantage of it to avoid crashing and instead pass the information about the success or failure of the allocation up the (C1 and C2 initialization) call stack up to where we can set the compilations as failed. > > Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/share/c1/c1_Runtime1.cpp > > Co-authored-by: Tobias Hartmann > - Update src/hotspot/share/c1/c1_Compiler.cpp > > Co-authored-by: Tobias Hartmann Thanks! ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19280#pullrequestreview-2068157983 From fgao at openjdk.org Tue May 21 10:11:05 2024 From: fgao at openjdk.org (Fei Gao) Date: Tue, 21 May 2024 10:11:05 GMT Subject: RFR: 8320622: [TEST] Improve coverage of compiler/loopopts/superword/TestMulAddS2I.java on different platforms In-Reply-To: <9QQIEkbwsbP5SUsMPjW4-YVkqWApkqPTNulw9gdNHMk=.2d68912c-b266-4459-b15b-953cb299b0db@github.com> References: <9QQIEkbwsbP5SUsMPjW4-YVkqWApkqPTNulw9gdNHMk=.2d68912c-b266-4459-b15b-953cb299b0db@github.com> Message-ID: On Mon, 20 May 2024 18:49:25 GMT, Vladimir Kozlov wrote: > Are all platforms support both (`true` and `false`) values of `AlignVector`? I see it is only adjusted for x86 and aarch64 in `vm_version_.cpp` files. Thanks for your review @vnkozlov . Yeah, and `AlignVector` only works by the return value here, https://github.com/openjdk/jdk/blob/d6b7f9b170b6ce4f7275cc7595b71b9a3e93c133/src/hotspot/share/opto/superword.hpp#L575. `Matcher::misaligned_vectors_ok()` is architecture dependent. `AlignVector` is `true` by default. When we sets `false` by cmd line, if the platform doesn't support misaligned load/store, `Matcher::misaligned_vectors_ok()` should return `false` and the actual value of `vectors_should_be_aligned()` is still `true` . It should be okay to set `true` or `false` on all platforms. WDYT? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19305#issuecomment-2122264610 From sjayagond at openjdk.org Tue May 21 10:47:06 2024 From: sjayagond at openjdk.org (Sidraya Jayagond) Date: Tue, 21 May 2024 10:47:06 GMT Subject: RFR: 8331934: [s390x] Add support for primitive array C1 clone intrinsic [v3] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 09:25:32 GMT, Amit Kumar wrote: >> Adds JDK-8302850 Port for s390x. >> >> Testing: >> >> make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:hotspot_compiler 1166 1166 0 0 >> ============================== >> TEST SUCCESS >> >> * Tier1 Test with Fast debug build. >> >> BenchMarking: >> >> >> Without Patch: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 10.838 ? 0.461 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 28.919 ? 1.695 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 48.815 ? 0.901 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 256.357 ? 7.901 ns/op >> ArrayClone.byteClone 0 avgt 15 90.398 ? 3.119 ns/op >> ArrayClone.byteClone 10 avgt 15 103.774 ? 4.468 ns/op >> ArrayClone.byteClone 100 avgt 15 126.628 ? 6.952 ns/op >> ArrayClone.byteClone 1000 avgt 15 326.409 ? 31.635 ns/op >> ArrayClone.intArraycopy 0 avgt 15 10.450 ? 0.509 ns/op >> ArrayClone.intArraycopy 10 avgt 15 36.903 ? 0.753 ns/op >> ArrayClone.intArraycopy 100 avgt 15 85.964 ? 1.806 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 841.512 ? 40.335 ns/op >> ArrayClone.intClone 0 avgt 15 89.332 ? 3.695 ns/op >> ArrayClone.intClone 10 avgt 15 110.639 ? 2.476 ns/op >> ArrayClone.intClone 100 avgt 15 195.781 ? 8.622 ns/op >> ArrayClone.intClone 1000 avgt 15 1058.479 ? 92.468 ns/op >> Finished running test 'micro:java.lang.ArrayClone' >> >> >> with patch: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 10.526... > > Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: > > - Merge master > - s390x Port > - Update src/hotspot/share/c1/c1_GraphBuilder.cpp > > Co-authored-by: Dean Long <17332032+dean-long at users.noreply.github.com> > - Fix assert to only have a single ! > - Assert type is not interface > - Remove whitespace > - Expanded testing in TestNullArrayClone > > * Added byte[] and long[] tests. > * Verified that the cloned array has the same contents. > * Increase number of iterations reach tier 3 threshold. > - Update src/hotspot/share/c1/c1_GraphBuilder.cpp > > Co-authored-by: Boris <42576543+bulasevich at users.noreply.github.com> > - Added test summary > - Use vmIntrinsics instead of vmIntrinsicID > - ... and 16 more: https://git.openjdk.org/jdk/compare/2f10a316...865de5ba LGTM ------------- Marked as reviewed by sjayagond (Author). PR Review: https://git.openjdk.org/jdk/pull/19220#pullrequestreview-2068262267 From amitkumar at openjdk.org Tue May 21 10:57:00 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 21 May 2024 10:57:00 GMT Subject: RFR: 8320622: [TEST] Improve coverage of compiler/loopopts/superword/TestMulAddS2I.java on different platforms In-Reply-To: References: Message-ID: On Mon, 20 May 2024 08:56:35 GMT, Fei Gao wrote: > It would be worthwhile to improve the test coverage on all platforms by applying another common VM flag. I have tested it on s390x, look good for us. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19305#issuecomment-2122359669 From mli at openjdk.org Tue May 21 11:56:27 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 21 May 2024 11:56:27 GMT Subject: RFR: 8320999: RISC-V: C2 RotateLeftV Message-ID: Hi, Can you help to review this patch? More detailed description is inline in the code. Thanks ------------- Commit messages: - add comments - fix mask - fix imm & long - fixes - Merge branch 'master' into rotate-left-right-v - fixes - remove redundant code: UseZvbb - merge master - RotateLeftV/RotateRightV: Initial Commit Changes: https://git.openjdk.org/jdk/pull/19325/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19325&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320999 Stats: 293 lines in 4 files changed: 285 ins; 3 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/19325.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19325/head:pull/19325 PR: https://git.openjdk.org/jdk/pull/19325 From amitkumar at openjdk.org Tue May 21 12:05:11 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 21 May 2024 12:05:11 GMT Subject: RFR: 8331934: [s390x] Add support for primitive array C1 clone intrinsic [v3] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 09:25:32 GMT, Amit Kumar wrote: >> Adds JDK-8302850 Port for s390x. >> >> Testing: >> >> make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:hotspot_compiler 1166 1166 0 0 >> ============================== >> TEST SUCCESS >> >> * Tier1 Test with Fast debug build. >> >> BenchMarking: >> >> >> Without Patch: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 10.838 ? 0.461 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 28.919 ? 1.695 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 48.815 ? 0.901 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 256.357 ? 7.901 ns/op >> ArrayClone.byteClone 0 avgt 15 90.398 ? 3.119 ns/op >> ArrayClone.byteClone 10 avgt 15 103.774 ? 4.468 ns/op >> ArrayClone.byteClone 100 avgt 15 126.628 ? 6.952 ns/op >> ArrayClone.byteClone 1000 avgt 15 326.409 ? 31.635 ns/op >> ArrayClone.intArraycopy 0 avgt 15 10.450 ? 0.509 ns/op >> ArrayClone.intArraycopy 10 avgt 15 36.903 ? 0.753 ns/op >> ArrayClone.intArraycopy 100 avgt 15 85.964 ? 1.806 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 841.512 ? 40.335 ns/op >> ArrayClone.intClone 0 avgt 15 89.332 ? 3.695 ns/op >> ArrayClone.intClone 10 avgt 15 110.639 ? 2.476 ns/op >> ArrayClone.intClone 100 avgt 15 195.781 ? 8.622 ns/op >> ArrayClone.intClone 1000 avgt 15 1058.479 ? 92.468 ns/op >> Finished running test 'micro:java.lang.ArrayClone' >> >> >> with patch: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 10.526... > > Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: > > - Merge master > - s390x Port > - Update src/hotspot/share/c1/c1_GraphBuilder.cpp > > Co-authored-by: Dean Long <17332032+dean-long at users.noreply.github.com> > - Fix assert to only have a single ! > - Assert type is not interface > - Remove whitespace > - Expanded testing in TestNullArrayClone > > * Added byte[] and long[] tests. > * Verified that the cloned array has the same contents. > * Increase number of iterations reach tier 3 threshold. > - Update src/hotspot/share/c1/c1_GraphBuilder.cpp > > Co-authored-by: Boris <42576543+bulasevich at users.noreply.github.com> > - Added test summary > - Use vmIntrinsics instead of vmIntrinsicID > - ... and 16 more: https://git.openjdk.org/jdk/compare/2f10a316...865de5ba Ran this command: `make test TEST="hotspot_compiler hotspot_gc hotspot_serviceability hotspot_runtime tier1 tier2 tier3" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1 -Xcomp"` and didn't see any new failure appearing due to my changes. Thanks Martin & Sid for approval ;-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19220#issuecomment-2122474496 PR Comment: https://git.openjdk.org/jdk/pull/19220#issuecomment-2122475832 From amitkumar at openjdk.org Tue May 21 12:05:12 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 21 May 2024 12:05:12 GMT Subject: Integrated: 8331934: [s390x] Add support for primitive array C1 clone intrinsic In-Reply-To: References: Message-ID: On Mon, 13 May 2024 17:08:03 GMT, Amit Kumar wrote: > Adds JDK-8302850 Port for s390x. > > Testing: > > make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:hotspot_compiler 1166 1166 0 0 > ============================== > TEST SUCCESS > > * Tier1 Test with Fast debug build. > > BenchMarking: > > > Without Patch: > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 10.838 ? 0.461 ns/op > ArrayClone.byteArraycopy 10 avgt 15 28.919 ? 1.695 ns/op > ArrayClone.byteArraycopy 100 avgt 15 48.815 ? 0.901 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 256.357 ? 7.901 ns/op > ArrayClone.byteClone 0 avgt 15 90.398 ? 3.119 ns/op > ArrayClone.byteClone 10 avgt 15 103.774 ? 4.468 ns/op > ArrayClone.byteClone 100 avgt 15 126.628 ? 6.952 ns/op > ArrayClone.byteClone 1000 avgt 15 326.409 ? 31.635 ns/op > ArrayClone.intArraycopy 0 avgt 15 10.450 ? 0.509 ns/op > ArrayClone.intArraycopy 10 avgt 15 36.903 ? 0.753 ns/op > ArrayClone.intArraycopy 100 avgt 15 85.964 ? 1.806 ns/op > ArrayClone.intArraycopy 1000 avgt 15 841.512 ? 40.335 ns/op > ArrayClone.intClone 0 avgt 15 89.332 ? 3.695 ns/op > ArrayClone.intClone 10 avgt 15 110.639 ? 2.476 ns/op > ArrayClone.intClone 100 avgt 15 195.781 ? 8.622 ns/op > ArrayClone.intClone 1000 avgt 15 1058.479 ? 92.468 ns/op > Finished running test 'micro:java.lang.ArrayClone' > > > with patch: > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 10.526 ? 0.289 ns/op > ArrayClone.byteArraycopy 10 avgt 15 27.110 ? 0.656 ns/op > Arra... This pull request has now been integrated. Changeset: ae9ad862 Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/ae9ad862ee54e119553efec919f1061dca36b954 Stats: 46 lines in 6 files changed: 23 ins; 2 del; 21 mod 8331934: [s390x] Add support for primitive array C1 clone intrinsic Reviewed-by: mdoerr, sjayagond ------------- PR: https://git.openjdk.org/jdk/pull/19220 From varadam at openjdk.org Tue May 21 12:11:13 2024 From: varadam at openjdk.org (Varada M) Date: Tue, 21 May 2024 12:11:13 GMT Subject: RFR: 8331935: Add support for primitive array C1 clone intrinsic in PPC [v4] In-Reply-To: References: Message-ID: <7HG6uTSZR9fs7PrsTNR1N0rzUCIwIgX3-W0VGPcrRyY=.0db16311-1313-4476-84c4-285ebd2a3fbc@github.com> > https://bugs.openjdk.org/browse/JDK-8302850 port for PPC64 > > JMH Benchmark Results > > > Before : > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 114.107 ? 1.337 ns/op > ArrayClone.byteArraycopy 10 avgt 15 130.492 ? 0.991 ns/op > ArrayClone.byteArraycopy 100 avgt 15 139.103 ? 1.913 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 321.688 ? 6.033 ns/op > ArrayClone.byteClone 0 avgt 15 227.602 ? 3.393 ns/op > ArrayClone.byteClone 10 avgt 15 237.624 ? 2.996 ns/op > ArrayClone.byteClone 100 avgt 15 239.219 ? 2.835 ns/op > > ArrayClone.byteClone 1000 avgt 15 355.571 ? 2.946 ns/op > ArrayClone.intArraycopy 0 avgt 15 113.275 ? 1.099 ns/op > ArrayClone.intArraycopy 10 avgt 15 129.763 ? 1.458 ns/op > ArrayClone.intArraycopy 100 avgt 15 213.327 ? 2.524 ns/op > ArrayClone.intArraycopy 1000 avgt 15 449.650 ? 7.338 ns/op > ArrayClone.intClone 0 avgt 15 225.682 ? 3.048 ns/op > ArrayClone.intClone 10 avgt 15 234.532 ? 2.817 ns/op > ArrayClone.intClone 100 avgt 15 295.934 ? 4.925 ns/op > ArrayClone.intClone 1000 avgt 15 573.368 ? 5.739 ns/op > Finished running test 'micro:java.lang.ArrayClone' > Test report is stored in build/aix-ppc64-server-release/test-results/micro_java_lang_ArrayClone > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > micro:java.lang.ArrayClone 1 1 0 0 > ============================== > TEST SUCCESS > > Finished building target 'test' in configuration 'aix-ppc64-server-release' > > > > > After: > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 113.894 ? 0.993 ns/op > ArrayClone.byteArraycopy 10 avgt 15 131.455 ? 0.956 ns/op > ArrayClone.byteArraycopy 100 avgt 15 139.145 ? 3.002 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 315.957 ? 14.591 ns/op > ArrayClone.byteClone 0 avgt 15 43.753 ? 3.669 ns/op > ArrayClone.byteClone 10 avgt 15 52.329 ? 1.041 ns/op > ArrayClone.byteClone 100 avgt 15 127.711 ? 3.938 ns/op > > ArrayClone.byteClone 1000 avgt 15 225.937 ? 1.987 ns/op > ArrayClone.intArraycopy 0 avgt 15 113.788 ? 0.770 ns/op > ArrayClone.intArraycopy 10 avgt 1... Varada M has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' into arryClone - Add support for primitive array C1 clone intrinsic - Add support for primitive array C1 clone intrinsic - Add support for primitive array C1 clone intrinsic - Add support for primitive array C1 clone intrinsic ------------- Changes: https://git.openjdk.org/jdk/pull/19250/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19250&range=03 Stats: 64 lines in 6 files changed: 27 ins; 3 del; 34 mod Patch: https://git.openjdk.org/jdk/pull/19250.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19250/head:pull/19250 PR: https://git.openjdk.org/jdk/pull/19250 From amitkumar at openjdk.org Tue May 21 12:15:04 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 21 May 2024 12:15:04 GMT Subject: RFR: 8331935: Add support for primitive array C1 clone intrinsic in PPC [v4] In-Reply-To: <7HG6uTSZR9fs7PrsTNR1N0rzUCIwIgX3-W0VGPcrRyY=.0db16311-1313-4476-84c4-285ebd2a3fbc@github.com> References: <7HG6uTSZR9fs7PrsTNR1N0rzUCIwIgX3-W0VGPcrRyY=.0db16311-1313-4476-84c4-285ebd2a3fbc@github.com> Message-ID: On Tue, 21 May 2024 12:11:13 GMT, Varada M wrote: >> https://bugs.openjdk.org/browse/JDK-8302850 port for PPC64 >> >> JMH Benchmark Results >> >> >> Before : >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 114.107 ? 1.337 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 130.492 ? 0.991 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 139.103 ? 1.913 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 321.688 ? 6.033 ns/op >> ArrayClone.byteClone 0 avgt 15 227.602 ? 3.393 ns/op >> ArrayClone.byteClone 10 avgt 15 237.624 ? 2.996 ns/op >> ArrayClone.byteClone 100 avgt 15 239.219 ? 2.835 ns/op >> >> ArrayClone.byteClone 1000 avgt 15 355.571 ? 2.946 ns/op >> ArrayClone.intArraycopy 0 avgt 15 113.275 ? 1.099 ns/op >> ArrayClone.intArraycopy 10 avgt 15 129.763 ? 1.458 ns/op >> ArrayClone.intArraycopy 100 avgt 15 213.327 ? 2.524 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 449.650 ? 7.338 ns/op >> ArrayClone.intClone 0 avgt 15 225.682 ? 3.048 ns/op >> ArrayClone.intClone 10 avgt 15 234.532 ? 2.817 ns/op >> ArrayClone.intClone 100 avgt 15 295.934 ? 4.925 ns/op >> ArrayClone.intClone 1000 avgt 15 573.368 ? 5.739 ns/op >> Finished running test 'micro:java.lang.ArrayClone' >> Test report is stored in build/aix-ppc64-server-release/test-results/micro_java_lang_ArrayClone >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> micro:java.lang.ArrayClone 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> Finished building target 'test' in configuration 'aix-ppc64-server-release' >> >> >> >> >> After: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 113.894 ? 0.993 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 131.455 ? 0.956 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 139.145 ? 3.002 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 315.957 ? 14.591 ns/op >> ArrayClone.byteClone 0 avgt 15 43.753 ? 3.669 ns/op >> ArrayClone.byteClone 10 avgt 15 52.329 ? 1.041 ns/op >> ArrayClone.byteClone 100 avgt 15 127.711 ? 3.938 ns/op >> >> ArrayClone.byteClone 1000 avgt 15 225.937 ? 1.987 ns/op >> Arr... > > Varada M has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into arryClone > - Add support for primitive array C1 clone intrinsic > - Add support for primitive array C1 clone intrinsic > - Add support for primitive array C1 clone intrinsic > - Add support for primitive array C1 clone intrinsic I have compared with s390x and it looks fine to me. But note that I don't have AIX machine to test. ------------- Marked as reviewed by amitkumar (Committer). PR Review: https://git.openjdk.org/jdk/pull/19250#pullrequestreview-2068463570 From varadam at openjdk.org Tue May 21 12:35:03 2024 From: varadam at openjdk.org (Varada M) Date: Tue, 21 May 2024 12:35:03 GMT Subject: RFR: 8331935: Add support for primitive array C1 clone intrinsic in PPC In-Reply-To: References: Message-ID: <7utHJvWwEy9QoIxeXfJzGpdor7JUe-W0qfIktdeTfGQ=.34c8fa30-c45f-4c75-b4f0-bd6a34ea431a@github.com> On Fri, 17 May 2024 15:11:13 GMT, Martin Doerr wrote: >> https://bugs.openjdk.org/browse/JDK-8302850 port for PPC64 >> >> JMH Benchmark Results >> >> >> Before : >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 114.107 ? 1.337 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 130.492 ? 0.991 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 139.103 ? 1.913 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 321.688 ? 6.033 ns/op >> ArrayClone.byteClone 0 avgt 15 227.602 ? 3.393 ns/op >> ArrayClone.byteClone 10 avgt 15 237.624 ? 2.996 ns/op >> ArrayClone.byteClone 100 avgt 15 239.219 ? 2.835 ns/op >> >> ArrayClone.byteClone 1000 avgt 15 355.571 ? 2.946 ns/op >> ArrayClone.intArraycopy 0 avgt 15 113.275 ? 1.099 ns/op >> ArrayClone.intArraycopy 10 avgt 15 129.763 ? 1.458 ns/op >> ArrayClone.intArraycopy 100 avgt 15 213.327 ? 2.524 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 449.650 ? 7.338 ns/op >> ArrayClone.intClone 0 avgt 15 225.682 ? 3.048 ns/op >> ArrayClone.intClone 10 avgt 15 234.532 ? 2.817 ns/op >> ArrayClone.intClone 100 avgt 15 295.934 ? 4.925 ns/op >> ArrayClone.intClone 1000 avgt 15 573.368 ? 5.739 ns/op >> Finished running test 'micro:java.lang.ArrayClone' >> Test report is stored in build/aix-ppc64-server-release/test-results/micro_java_lang_ArrayClone >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> micro:java.lang.ArrayClone 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> Finished building target 'test' in configuration 'aix-ppc64-server-release' >> >> >> >> >> After: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 113.894 ? 0.993 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 131.455 ? 0.956 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 139.145 ? 3.002 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 315.957 ? 14.591 ns/op >> ArrayClone.byteClone 0 avgt 15 43.753 ? 3.669 ns/op >> ArrayClone.byteClone 10 avgt 15 52.329 ? 1.041 ns/op >> ArrayClone.byteClone 100 avgt 15 127.711 ? 3.938 ns/op >> >> ArrayClone.byteClone 1000 avgt 15 225.937 ? 1.987 ns/op >> Arr... > > I also have a minor cleanup proposal for `LIR_Assembler::emit_arraycopy`: > > diff --git a/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp b/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp > index dba662a2212..2424d820177 100644 > --- a/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp > +++ b/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp > @@ -1827,18 +1827,17 @@ void LIR_Assembler::emit_arraycopy(LIR_OpArrayCopy* op) { > > int flags = op->flags(); > ciArrayKlass* default_type = op->expected_type(); > - BasicType basic_type = default_type != nullptr ? default_type->element_type()->basic_type() : T_ILLEGAL; > + BasicType basic_type = (default_type != nullptr) ? default_type->element_type()->basic_type() : T_ILLEGAL; > if (basic_type == T_ARRAY) basic_type = T_OBJECT; > > // Set up the arraycopy stub information. > ArrayCopyStub* stub = op->stub(); > - const int frame_resize = frame::native_abi_reg_args_size - sizeof(frame::java_abi); // C calls need larger frame. > > // Always do stub if no type information is available. It's ok if > // the known type isn't loaded since the code sanity checks > // in debug mode and the type isn't required when we know the exact type > // also check that the type is an array type. > - if (op->expected_type() == nullptr) { > + if (default_type == nullptr) { > assert(src->is_nonvolatile() && src_pos->is_nonvolatile() && dst->is_nonvolatile() && dst_pos->is_nonvolatile() && > length->is_nonvolatile(), "must preserve"); > address copyfunc_addr = StubRoutines::generic_arraycopy(); > @@ -1873,7 +1872,7 @@ void LIR_Assembler::emit_arraycopy(LIR_OpArrayCopy* op) { > return; > } > > - assert(default_type != nullptr && default_type->is_array_klass(), "must be true at this point"); > + assert(default_type != nullptr && default_type->is_array_klass() && default_type->is_loaded(), "must be true at this point"); > Label cont, slow, copyfunc; > > bool simple_check_flag_set = flags & (LIR_OpArrayCopy::src_null_check | > > Would be nice to have. Thank you @TheRealMDoerr @offamitkumar . I am running the tests: hotspot_compiler, hotspot_gc, hotspot_serviceability and hotspot_runtime for tier1, tier2 and tier3 with fastdebug, slowdebug and release. I will update the results. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19250#issuecomment-2122531420 From mli at openjdk.org Tue May 21 12:37:07 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 21 May 2024 12:37:07 GMT Subject: RFR: 8332153: RISC-V: enable tests and add comment for vector shift instruct (shared by vectorization and Vector API) [v4] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 07:41:27 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> For vector shift instruct, some corresponding tests are not enabled, this is to enable them. >> And the way how vector shift instruct works is not clear, especially both vectorization (SLP in jdk) and Vector API share the same instruct's in riscv_v.ad, so also added some comment to clarify it. >> >> Thanks > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > minor Yes, the tests passed. Thanks @RealFYang @luhenry for your reviewing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19265#issuecomment-2122532350 From mli at openjdk.org Tue May 21 12:37:08 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 21 May 2024 12:37:08 GMT Subject: Integrated: 8332153: RISC-V: enable tests and add comment for vector shift instruct (shared by vectorization and Vector API) In-Reply-To: References: Message-ID: On Thu, 16 May 2024 11:12:09 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > For vector shift instruct, some corresponding tests are not enabled, this is to enable them. > And the way how vector shift instruct works is not clear, especially both vectorization (SLP in jdk) and Vector API share the same instruct's in riscv_v.ad, so also added some comment to clarify it. > > Thanks This pull request has now been integrated. Changeset: 5cf8288b Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/5cf8288b8071bdcf0c923dd7ba36f91bc7594ef3 Stats: 176 lines in 10 files changed: 173 ins; 0 del; 3 mod 8332153: RISC-V: enable tests and add comment for vector shift instruct (shared by vectorization and Vector API) Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/19265 From pminborg at openjdk.org Tue May 21 12:56:12 2024 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 21 May 2024 12:56:12 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v20] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Fri, 17 May 2024 09:31:33 GMT, Per Minborg wrote: >> # Stable Values & Collections (Internal) >> >> ## Summary >> This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. >> >> ## Goals >> * Provide an easy and intuitive API to describe value holders that can change at most once. >> * Decouple declaration from initialization without significant footprint or performance penalties. >> * Reduce the amount of static initializer and/or field initialization code. >> * Uphold integrity and consistency, even in a multi-threaded environment. >> >> For more details, see the draft JEP: https://openjdk.org/jeps/8312611 >> >> ## Performance >> Performance compared to instance variables using two `AtomicReference` and two protected by double-checked locking under concurrent access by all threads: >> >> >> Benchmark Mode Cnt Score Error Units >> StableBenchmark.atomic thrpt 10 259.478 ? 36.809 ops/us >> StableBenchmark.dcl thrpt 10 225.710 ? 26.638 ops/us >> StableBenchmark.stable thrpt 10 4382.478 ? 1151.472 ops/us <- StableValue significantly faster >> >> >> Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (all threads): >> >> >> Benchmark Mode Cnt Score Error Units >> StableStaticBenchmark.atomic thrpt 10 6487.835 ? 385.639 ops/us >> StableStaticBenchmark.dcl thrpt 10 6605.239 ? 210.610 ops/us >> StableStaticBenchmark.stable thrpt 10 14338.239 ? 1426.874 ops/us >> StableStaticBenchmark.staticCHI thrpt 10 13780.341 ? 1839.651 ops/us >> >> >> Performance for stable lists (thread safe) in both instance and static contexts whereby we access a single value compared to `ArrayList` instances (which are not thread-safe) (all threads): >> >> >> Benchmark Mode Cnt Score Error Units >> StableListElementBenchmark.instanceArrayList thrpt 10 5812.992 ? 1169.730 ops/us >> StableListElementBenchmark.instanceList thrpt 10 4818.643 ? 704.893 ops/us >> StableListElementBenchmark... > > Per Minborg has updated the pull request incrementally with two additional commits since the last revision: > > - Add benchmarks for memoized IntFunction and Function > - Add benchmark for memoized supplier We are considering another implementation with less complexity. So, for now, thank you for all the feedback so far. We will try to make sure to carry over them to a new PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18794#issuecomment-2122569580 From mli at openjdk.org Tue May 21 13:05:04 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 21 May 2024 13:05:04 GMT Subject: RFR: 8332533: RISC-V: Enable vector variable shift instructions for machines with RVV In-Reply-To: References: Message-ID: On Mon, 20 May 2024 15:23:26 GMT, Gui Cao wrote: > Hi, I noticed the following warning in the Opto JIT Code for the Vector API in the `test/jdk/jdk/incubator/vector/Byte256VectorTests.java: ASHRByte256VectorTests` test: > > -------------------------------------------------------------------------------- > ** Rejected vector op (RShiftVB,byte,32) because architecture does not support variable vector shifts > ** not supported: arity=2 opc=405 vlen=32 etype=byte ismask=0 is_masked_op=0 > ``` > the reason is because Matcher::supports_vector_ variable_shifts returns false. the port of RISC-V Vector API now supports the vector shifts, so this should return with UseRVV. By the Way, the Matcher::supports_vector_variable_shifts function was introduced by Vector API, and I think forgot to modify the Matcher::supports_vector_variable_shifts function when implementing vector shift. > After the fix, the test passes normally and generates the Opto JIT Code such as: > > 1c2 loadV V1, [R7] # vector (rvv) > 1ca lwu R28, [R28, #12] # loadN, compressed ptr, #@loadN ! Field: jdk/internal/vm/vector/VectorSupport$VectorPayload.payload (constant) > 1ce decode_heap_oop R7, R28 #@decodeHeapOop > 1d2 addi R7, R7, #16 # ptr, #@addP_reg_imm > 1d4 loadV V2, [R7] # vector (rvv) > 1dc vand_immI V1, V1, #7 > 1e4 spill [sp, #48] -> R7 # spill size = 32 > 1e6 # castII of R7, #@castII > 1e6 vasrB V3, V2, V1 > 1fa spill [sp, #96] -> R29 # spill size = 32 > 1fc bgeu R7, R29, B101 #@cmpU_branch P=0.000001 C=-1.000000 > > > ### Testing: > qemu 8.1.50 with UseRVV: > - [ ] Run tier1-3 tests (release) > - [x] Run test/jdk/jdk/incubator/vector (fastdebug) Looks good, thanks. ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19313#pullrequestreview-2068575117 From bkilambi at openjdk.org Tue May 21 13:11:22 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 21 May 2024 13:11:22 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v10] In-Reply-To: References: Message-ID: <26UiEE_uEKUU0lg_T91K-b4Or3mtGluJYybbJOpETOU=.a74004d6-590f-49e7-8880-4ab6627926dd@github.com> > Floating-point addition is non-associative, that is adding floating-point elements in arbitrary order may get different value. Specially, Vector API does not define the order of reduction intentionally, which allows platforms to generate more efficient codes [1]. So that needs a node to represent non strictly-ordered add-reduction for floating-point type in C2. > > To avoid introducing new nodes, this patch adds a bool field in `AddReductionVF/D` to distinguish whether they require strict order. It also removes `UnorderedReductionNode` and adds a virtual function `bool requires_strict_order()` in `ReductionNode`. Besides `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` have a fixed value. > > With this patch, Vector API would always generate non strictly-ordered `AddReductionVF/D' on SVE machines with vector length <= 16B as it is more beneficial to generate non-strictly ordered instructions on such machines compared to strictly ordered ones. > > [AArch64] > On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. Auto-vectorization has already banned these nodes in JDK-8275275 [2]. > > This patch adds matching rules for non strictly-ordered `AddReductionVF/D`. > > No effects on other platforms. > > [Performance] > FloatMaxVector.ADDLanes [3] measures the performance of add reduction for floating-point type. With this patch, it improves ~3x on my SVE machine (128-bit). > > ADDLanes > > Benchmark Before After Unit > FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms > > > Final code is as below: > > Before: > ` fadda z17.s, p7/m, z17.s, z16.s > ` > After: > > faddp v17.4s, v21.4s, v21.4s > faddp s18, v17.2s > fadd s18, s18, s19 > > > > > [Test] > Full jtreg passed on AArch64 and x86. > > [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2529 > [2] https://bugs.openjdk.org/browse/JDK-8275275 > [3] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L316 Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: Modify JTREG IR rules and some style/format changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18034/files - new: https://git.openjdk.org/jdk/pull/18034/files/3afde82c..b8f6cfb5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18034&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18034&range=08-09 Stats: 36 lines in 4 files changed: 0 ins; 1 del; 35 mod Patch: https://git.openjdk.org/jdk/pull/18034.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18034/head:pull/18034 PR: https://git.openjdk.org/jdk/pull/18034 From bkilambi at openjdk.org Tue May 21 13:11:23 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 21 May 2024 13:11:23 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v8] In-Reply-To: References: <8-_t7nWbR9gZ2_QkfFNuf5M0Q4PMkKJKgwS3ZbHcCxI=.32dc4f11-dec5-468d-afc8-3b4dae285dcb@github.com> <2y-Ag6MxVDJfYl6kM0FYjQA-kzSCekUgAMWAZmkECyQ=.2a2a0a8e-fc67-42a4-bd67-b4ae3b60bcea@github.com> Message-ID: On Mon, 13 May 2024 11:01:30 GMT, Emanuel Peter wrote: >> @eme64 Thanks for the clarification. I understand the usage of `counts` in the IR tests. Just that I got a bit confused by some of your earlier statements. We do actually have a test to make sure AddReductionVF/VD and MulReductionVF/VD are not generated on aarch64 NEON machines - `test/hotspot/jtreg/compiler/c2/irTests/TestDisableAutoVectOpcodes.java`. I can modify this test to include UseSVE > 0 case as well and will also add a separate JTREG test for the VectorAPI tests. Hope that's ok.. > > @Bhavana-Kilambi > I know we have the tests in `test/hotspot/jtreg/compiler/c2/irTests/TestDisableAutoVectOpcodes.java`, and some other reduction tests. But these do not do the specific think I would like to see. > > I would like this: > - Add `no_strict_order` vs `requires_strict_order` or similar to `dump_spec`. > - IR match not just that there is the correct `ReductionNode`, but also that it has the `no_strict_order` or `requires_strict_order` in its dump. You can do that by using a custom regex string, rather than `IRNode.STORE_VECTOR` or similar. > - Then, create different tests, some where we expect ordered, some unordered vectors. Use Vector API and SuperWord examples. > > Does that make sense? Hi @eme64 , I have uploaded the revised patch now. Can you please review? Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18034#issuecomment-2122601430 From dfenacci at openjdk.org Tue May 21 13:47:32 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 21 May 2024 13:47:32 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v12] In-Reply-To: References: Message-ID: > # Issue > When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. > > # Causes > On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. > The same is true for `StoreVector`s. > When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 > > where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor offsets interfere. > Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > but we don?t make sure that there are no masks or offsets. > A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and offsets here too but in this case we can include these cases if the masks and offsets of the vector stores are equivalent. > > # Solution > To avoid folding `Load`- and `StoreVector`s with masks and offsets we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns its own (to avoid ever being the same as a store node). In this way, the checks in `MemNode::can_see_stored_value` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 > > and `StoreNode::Identity` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > will fail if masks or offsets are used. > For 2 stores of the same value we instead check for mask and offset equality. > > Regression tests for all versions of `Load/StoreVectorGather/Masked` have been add... Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8325520: add override keyword ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18347/files - new: https://git.openjdk.org/jdk/pull/18347/files/e676bcb1..df3a49ae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=10-11 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18347.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18347/head:pull/18347 PR: https://git.openjdk.org/jdk/pull/18347 From dfenacci at openjdk.org Tue May 21 13:47:32 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 21 May 2024 13:47:32 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v4] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 06:42:43 GMT, Emanuel Peter wrote: > Because we could allow those to use vectors in the future: I would leave the type checks in for now. ? better safe than sorry ------------- PR Comment: https://git.openjdk.org/jdk/pull/18347#issuecomment-2122667352 From dfenacci at openjdk.org Tue May 21 13:47:33 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 21 May 2024 13:47:33 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v11] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 06:48:12 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectornode.hpp line 979: >> >>> 977: idx == MemNode::ValueIn || >>> 978: idx == MemNode::ValueIn + 1; } >>> 979: virtual Node* offsets() const { return in(Offsets); } >> >> Would be nice to add some `override` keywords here ;) > > And also below. `override` added ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1608363723 From mli at openjdk.org Tue May 21 13:54:01 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 21 May 2024 13:54:01 GMT Subject: RFR: 8332533: RISC-V: Enable vector variable shift instructions for machines with RVV In-Reply-To: References: Message-ID: On Mon, 20 May 2024 15:23:26 GMT, Gui Cao wrote: > Hi, I noticed the following warning in the Opto JIT Code for the Vector API in the `test/jdk/jdk/incubator/vector/Byte256VectorTests.java: ASHRByte256VectorTests` test: > > -------------------------------------------------------------------------------- > ** Rejected vector op (RShiftVB,byte,32) because architecture does not support variable vector shifts > ** not supported: arity=2 opc=405 vlen=32 etype=byte ismask=0 is_masked_op=0 > ``` > the reason is because Matcher::supports_vector_ variable_shifts returns false. the port of RISC-V Vector API now supports the vector shifts, so this should return with UseRVV. By the Way, the Matcher::supports_vector_variable_shifts function was introduced by Vector API, and I think forgot to modify the Matcher::supports_vector_variable_shifts function when implementing vector shift. > After the fix, the test passes normally and generates the Opto JIT Code such as: > > 1c2 loadV V1, [R7] # vector (rvv) > 1ca lwu R28, [R28, #12] # loadN, compressed ptr, #@loadN ! Field: jdk/internal/vm/vector/VectorSupport$VectorPayload.payload (constant) > 1ce decode_heap_oop R7, R28 #@decodeHeapOop > 1d2 addi R7, R7, #16 # ptr, #@addP_reg_imm > 1d4 loadV V2, [R7] # vector (rvv) > 1dc vand_immI V1, V1, #7 > 1e4 spill [sp, #48] -> R7 # spill size = 32 > 1e6 # castII of R7, #@castII > 1e6 vasrB V3, V2, V1 > 1fa spill [sp, #96] -> R29 # spill size = 32 > 1fc bgeu R7, R29, B101 #@cmpU_branch P=0.000001 C=-1.000000 > > > ### Testing: > qemu 8.1.50 with UseRVV: > - [x] Run tier1-3 tests (release) > - [x] Run test/jdk/jdk/incubator/vector (fastdebug) Is `UseRVV ` a const? Seems you need to remove `constexpr`. Or you can just return true (so, keep constexpr), as `supports_vector_variable_shifts` is not a independent check, and it's protected by `Matcher::match_rule_supported_vector` in both vector api and auto-vectorization. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19313#issuecomment-2122687800 From epeter at openjdk.org Tue May 21 14:00:05 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 May 2024 14:00:05 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v12] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 13:47:32 GMT, Damon Fenacci wrote: >> # Issue >> When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. >> >> # Causes >> On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. >> The same is true for `StoreVector`s. >> When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 >> >> where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor offsets interfere. >> Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 >> >> but we don?t make sure that there are no masks or offsets. >> A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and offsets here too but in this case we can include these cases if the masks and offsets of the vector stores are equivalent. >> >> # Solution >> To avoid folding `Load`- and `StoreVector`s with masks and offsets we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns its own (to avoid ever being the same as a store node). In this way, the checks in `MemNode::can_see_stored_value` >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 >> >> and `StoreNode::Identity` >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 >> >> will fail if masks or offsets are used. >> For 2 stores of the same value we instead check for mask and offset equality. >> >> Regression tests for... > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8325520: add override keyword Looks good now! Thanks for all the updates, I think now the fix looks really concise ? > we add a specific store_Opcode method to LoadVectorGatherNode, LoadVectorMaskedNode and LoadVectorGatherMaskedNode that doesn?t return a store opcode but instead returns its own (to avoid ever being the same as a store node). This part in the PR description could be updated: now we return `-1` for those that we think are not "comparable". ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18347#pullrequestreview-2068727496 From liach at openjdk.org Tue May 21 14:06:11 2024 From: liach at openjdk.org (Chen Liang) Date: Tue, 21 May 2024 14:06:11 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) [v20] In-Reply-To: References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Fri, 17 May 2024 09:31:33 GMT, Per Minborg wrote: >> # Stable Values & Collections (Internal) >> >> ## Summary >> This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. >> >> ## Goals >> * Provide an easy and intuitive API to describe value holders that can change at most once. >> * Decouple declaration from initialization without significant footprint or performance penalties. >> * Reduce the amount of static initializer and/or field initialization code. >> * Uphold integrity and consistency, even in a multi-threaded environment. >> >> For more details, see the draft JEP: https://openjdk.org/jeps/8312611 >> >> ## Performance >> Performance compared to instance variables using two `AtomicReference` and two protected by double-checked locking under concurrent access by all threads: >> >> >> Benchmark Mode Cnt Score Error Units >> StableBenchmark.atomic thrpt 10 259.478 ? 36.809 ops/us >> StableBenchmark.dcl thrpt 10 225.710 ? 26.638 ops/us >> StableBenchmark.stable thrpt 10 4382.478 ? 1151.472 ops/us <- StableValue significantly faster >> >> >> Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (all threads): >> >> >> Benchmark Mode Cnt Score Error Units >> StableStaticBenchmark.atomic thrpt 10 6487.835 ? 385.639 ops/us >> StableStaticBenchmark.dcl thrpt 10 6605.239 ? 210.610 ops/us >> StableStaticBenchmark.stable thrpt 10 14338.239 ? 1426.874 ops/us >> StableStaticBenchmark.staticCHI thrpt 10 13780.341 ? 1839.651 ops/us >> >> >> Performance for stable lists (thread safe) in both instance and static contexts whereby we access a single value compared to `ArrayList` instances (which are not thread-safe) (all threads): >> >> >> Benchmark Mode Cnt Score Error Units >> StableListElementBenchmark.instanceArrayList thrpt 10 5812.992 ? 1169.730 ops/us >> StableListElementBenchmark.instanceList thrpt 10 4818.643 ? 704.893 ops/us >> StableListElementBenchmark... > > Per Minborg has updated the pull request incrementally with two additional commits since the last revision: > > - Add benchmarks for memoized IntFunction and Function > - Add benchmark for memoized supplier Thanks for the insights. Also, I wonder what is a good amount of metadata you are considering, as original stable values only take one possible representation out of all to indicate a mutable state, much lighter in weight compared to the current implementation, which takes many bits; this might disencourage StableValue use. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18794#issuecomment-2122714875 From dfenacci at openjdk.org Tue May 21 14:12:05 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 21 May 2024 14:12:05 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v12] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 13:56:59 GMT, Emanuel Peter wrote: > This part in the PR description could be updated: now we return `-1` for those that we think are not "comparable". You're right. Fixed. Thanks a lot for the review @eme64!! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18347#issuecomment-2122728013 From gcao at openjdk.org Tue May 21 14:32:30 2024 From: gcao at openjdk.org (Gui Cao) Date: Tue, 21 May 2024 14:32:30 GMT Subject: RFR: 8332533: RISC-V: Enable vector variable shift instructions for machines with RVV [v2] In-Reply-To: References: Message-ID: > Hi, I noticed the following warning in the Opto JIT Code for the Vector API in the `test/jdk/jdk/incubator/vector/Byte256VectorTests.java: ASHRByte256VectorTests` test: > > -------------------------------------------------------------------------------- > ** Rejected vector op (RShiftVB,byte,32) because architecture does not support variable vector shifts > ** not supported: arity=2 opc=405 vlen=32 etype=byte ismask=0 is_masked_op=0 > ``` > the reason is because Matcher::supports_vector_ variable_shifts returns false. the port of RISC-V Vector API now supports the vector shifts, so this should return with UseRVV. By the Way, the Matcher::supports_vector_variable_shifts function was introduced by Vector API, and I think forgot to modify the Matcher::supports_vector_variable_shifts function when implementing vector shift. > After the fix, the test passes normally and generates the Opto JIT Code such as: > > 1c2 loadV V1, [R7] # vector (rvv) > 1ca lwu R28, [R28, #12] # loadN, compressed ptr, #@loadN ! Field: jdk/internal/vm/vector/VectorSupport$VectorPayload.payload (constant) > 1ce decode_heap_oop R7, R28 #@decodeHeapOop > 1d2 addi R7, R7, #16 # ptr, #@addP_reg_imm > 1d4 loadV V2, [R7] # vector (rvv) > 1dc vand_immI V1, V1, #7 > 1e4 spill [sp, #48] -> R7 # spill size = 32 > 1e6 # castII of R7, #@castII > 1e6 vasrB V3, V2, V1 > 1fa spill [sp, #96] -> R29 # spill size = 32 > 1fc bgeu R7, R29, B101 #@cmpU_branch P=0.000001 C=-1.000000 > > > ### Testing: > qemu 8.1.50 with UseRVV: > - [x] Run tier1-3 tests (release) > - [x] Run test/jdk/jdk/incubator/vector (fastdebug) Gui Cao has updated the pull request incrementally with one additional commit since the last revision: Remove constexpr in Matcher::supports_vector_variable_shifts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19313/files - new: https://git.openjdk.org/jdk/pull/19313/files/546c255a..0c64eee1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19313&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19313&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19313.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19313/head:pull/19313 PR: https://git.openjdk.org/jdk/pull/19313 From kxu at openjdk.org Tue May 21 14:32:35 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 21 May 2024 14:32:35 GMT Subject: RFR: 8327380: Add tests for Shenandoah barrier expansion optimization Message-ID: <6jnsp3eSXnS5H2F915sWIiSQvz5-CN7hNuciqoc1Lp4=.445c9bdc-9c5d-427a-89ba-5bc0a57d2425@github.com> The Ideal graph for Shenandoah barrier expansion is optimized so that unnecessary checks are eliminated; however, currently there is no test cases to determine these optimizations are in effect. Adding unit tests with the IR test framework will support related code changes in the future. ------------- Commit messages: - update asserted IR phase - change assertion phase from MACRO_EXPANSION to AFTER_MACRO_EXPANSION_STEP - add license header - update test annotation - Merge branch 'master' into test-shenandoah-barrier-expansion - update vm flags - update vm flags - add TestShenandoahBarrierExpansion Changes: https://git.openjdk.org/jdk/pull/18814/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18814&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8327380 Stats: 89 lines in 1 file changed: 89 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18814.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18814/head:pull/18814 PR: https://git.openjdk.org/jdk/pull/18814 From gcao at openjdk.org Tue May 21 14:32:30 2024 From: gcao at openjdk.org (Gui Cao) Date: Tue, 21 May 2024 14:32:30 GMT Subject: RFR: 8332533: RISC-V: Enable vector variable shift instructions for machines with RVV In-Reply-To: References: Message-ID: On Tue, 21 May 2024 13:50:55 GMT, Hamlin Li wrote: > Is `UseRVV ` a const? Seems you need to remove `constexpr`. Or you can just return true (so, keep constexpr), as `supports_vector_variable_shifts` is not a independent check, and it's protected by `Matcher::match_rule_supported_vector` in both vector api and auto-vectorization. Thanks for your review. I've removed `constexpr`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19313#issuecomment-2122771148 From mli at openjdk.org Tue May 21 15:19:01 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 21 May 2024 15:19:01 GMT Subject: RFR: 8332533: RISC-V: Enable vector variable shift instructions for machines with RVV [v2] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 14:32:30 GMT, Gui Cao wrote: >> Hi, I noticed the following warning in the Opto JIT Code for the Vector API in the `test/jdk/jdk/incubator/vector/Byte256VectorTests.java: ASHRByte256VectorTests` test: >> >> -------------------------------------------------------------------------------- >> ** Rejected vector op (RShiftVB,byte,32) because architecture does not support variable vector shifts >> ** not supported: arity=2 opc=405 vlen=32 etype=byte ismask=0 is_masked_op=0 >> ``` >> the reason is because Matcher::supports_vector_ variable_shifts returns false. the port of RISC-V Vector API now supports the vector shifts, so this should return with UseRVV. By the Way, the Matcher::supports_vector_variable_shifts function was introduced by Vector API, and I think forgot to modify the Matcher::supports_vector_variable_shifts function when implementing vector shift. >> After the fix, the test passes normally and generates the Opto JIT Code such as: >> >> 1c2 loadV V1, [R7] # vector (rvv) >> 1ca lwu R28, [R28, #12] # loadN, compressed ptr, #@loadN ! Field: jdk/internal/vm/vector/VectorSupport$VectorPayload.payload (constant) >> 1ce decode_heap_oop R7, R28 #@decodeHeapOop >> 1d2 addi R7, R7, #16 # ptr, #@addP_reg_imm >> 1d4 loadV V2, [R7] # vector (rvv) >> 1dc vand_immI V1, V1, #7 >> 1e4 spill [sp, #48] -> R7 # spill size = 32 >> 1e6 # castII of R7, #@castII >> 1e6 vasrB V3, V2, V1 >> 1fa spill [sp, #96] -> R29 # spill size = 32 >> 1fc bgeu R7, R29, B101 #@cmpU_branch P=0.000001 C=-1.000000 >> >> >> ### Testing: >> qemu 8.1.50 with UseRVV: >> - [x] Run tier1-3 tests (release) >> - [x] Run test/jdk/jdk/incubator/vector (fastdebug) > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Remove constexpr in Matcher::supports_vector_variable_shifts Thanks, still good. ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19313#pullrequestreview-2068931037 From kvn at openjdk.org Tue May 21 15:55:09 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 21 May 2024 15:55:09 GMT Subject: RFR: 8332538: Switch off JIT memory limit check for TestAlignVectorFuzzer.java In-Reply-To: References: Message-ID: On Mon, 20 May 2024 18:36:08 GMT, Vladimir Kozlov wrote: > Add flag `-XX:CompileCommand=MemLimit,*.*,0` to TestAlignVectorFuzzer.java test until [JDK-8332537](https://bugs.openjdk.org/browse/JDK-8332537) is fixed. > > Tested: tier1 Thank you, Thomas and Tobias. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19316#issuecomment-2122938564 From kvn at openjdk.org Tue May 21 15:55:10 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 21 May 2024 15:55:10 GMT Subject: Integrated: 8332538: Switch off JIT memory limit check for TestAlignVectorFuzzer.java In-Reply-To: References: Message-ID: On Mon, 20 May 2024 18:36:08 GMT, Vladimir Kozlov wrote: > Add flag `-XX:CompileCommand=MemLimit,*.*,0` to TestAlignVectorFuzzer.java test until [JDK-8332537](https://bugs.openjdk.org/browse/JDK-8332537) is fixed. > > Tested: tier1 This pull request has now been integrated. Changeset: 52eda795 Author: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/52eda79522a5bd71b527e5946b654a331b021473 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod 8332538: Switch off JIT memory limit check for TestAlignVectorFuzzer.java Reviewed-by: stuefe, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/19316 From kvn at openjdk.org Tue May 21 16:42:01 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 21 May 2024 16:42:01 GMT Subject: RFR: 8320622: [TEST] Improve coverage of compiler/loopopts/superword/TestMulAddS2I.java on different platforms In-Reply-To: References: Message-ID: On Mon, 20 May 2024 08:56:35 GMT, Fei Gao wrote: > It would be worthwhile to improve the test coverage on all platforms by applying another common VM flag. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19305#pullrequestreview-2069128270 From duke at openjdk.org Tue May 21 16:56:22 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Tue, 21 May 2024 16:56:22 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v24] In-Reply-To: References: Message-ID: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: update APX warning text ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/49b117ef..f054589e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=22-23 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From duke at openjdk.org Tue May 21 16:56:23 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Tue, 21 May 2024 16:56:23 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v23] In-Reply-To: <9CD9vJMHAwob0CCW0697u1CAsCa3WZiJ2FEKDW0tc10=.fb45c38c-dd0f-43b1-a0a4-68963cca8391@github.com> References: <9CD9vJMHAwob0CCW0697u1CAsCa3WZiJ2FEKDW0tc10=.fb45c38c-dd0f-43b1-a0a4-68963cca8391@github.com> Message-ID: On Tue, 21 May 2024 06:03:16 GMT, Emanuel Peter wrote: >> Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: >> >> added comment about UseAPX and UseAVX > 2 correspondence > > src/hotspot/cpu/x86/assembler_x86.cpp line 1668: > >> 1666: void Assembler::andnl(Register dst, Register src1, Register src2) { >> 1667: assert(VM_Version::supports_bmi1(), "bit manipulation instructions not supported"); >> 1668: assert(!needs_eevex(dst, src1, src2) || UseAPX, "extended gpr use requires UseAPX and UseAVX > 2"); > > Technical detail: `UseAPX and UseAVX > 2` sounds wrong. Did you mean to say "or"? Because UseAPX is only enabled when `UseAVX >= 3`. Thanks, see comment below. > src/hotspot/cpu/x86/assembler_x86.cpp line 2036: > >> 2034: InstructionMark im(this); >> 2035: if (needs_eevex(crc, adr.base(), adr.index())) { >> 2036: assert(UseAPX, "extended gpr use requires UseAPX and UseAVX > 2"); > > Maybe here the "and" makes sense, but not sure. Yes, "and" is technically correct. APX includes instructions that require AVX 3 (evex) encoding for extended gpr use, together with +UseAPX. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1608653686 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1608653580 From duke at openjdk.org Tue May 21 17:00:11 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Tue, 21 May 2024 17:00:11 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v24] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 16:56:22 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > update APX warning text We will not start APX encoding instructions with this PR. APX encoding only comes into play when extended GPRs are used and the register allocator hasn't been extended to allocate extended GPRs yet. That will come in follow-up patches. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2123056212 From amitkumar at openjdk.org Tue May 21 17:05:04 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 21 May 2024 17:05:04 GMT Subject: RFR: 8331935: Add support for primitive array C1 clone intrinsic in PPC In-Reply-To: <7utHJvWwEy9QoIxeXfJzGpdor7JUe-W0qfIktdeTfGQ=.34c8fa30-c45f-4c75-b4f0-bd6a34ea431a@github.com> References: <7utHJvWwEy9QoIxeXfJzGpdor7JUe-W0qfIktdeTfGQ=.34c8fa30-c45f-4c75-b4f0-bd6a34ea431a@github.com> Message-ID: On Tue, 21 May 2024 12:32:21 GMT, Varada M wrote: > with fastdebug, slowdebug and release. I think with fastdebug is sufficient. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19250#issuecomment-2123063406 From duke at openjdk.org Tue May 21 17:12:08 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Tue, 21 May 2024 17:12:08 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v23] In-Reply-To: <6OhcNl6TmRZcvkUooVTggTv9X-6-nZI1UlF13mfD6q8=.a7a87c80-9479-491c-a71b-7157dc1a1cf6@github.com> References: <9CD9vJMHAwob0CCW0697u1CAsCa3WZiJ2FEKDW0tc10=.fb45c38c-dd0f-43b1-a0a4-68963cca8391@github.com> <6OhcNl6TmRZcvkUooVTggTv9X-6-nZI1UlF13mfD6q8=.a7a87c80-9479-491c-a71b-7157dc1a1cf6@github.com> Message-ID: On Tue, 21 May 2024 06:08:15 GMT, Emanuel Peter wrote: >> src/hotspot/cpu/x86/vm_version_x86.cpp line 1008: >> >>> 1006: if (UseAPX && (UseAVX < 3)) { >>> 1007: if (!FLAG_IS_DEFAULT(UseAPX)) { >>> 1008: warning("UseAPX is only available when UseAVX > 2"); >> >> Suggestion: >> >> warning("UseAPX is only available when UseAVX > 2. Disabling UseAPX."); > > This would tell the user what we are doing, just like with the UseAVX flag. Thank you, done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1608671699 From epeter at openjdk.org Tue May 21 17:20:08 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 May 2024 17:20:08 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v23] In-Reply-To: References: <9CD9vJMHAwob0CCW0697u1CAsCa3WZiJ2FEKDW0tc10=.fb45c38c-dd0f-43b1-a0a4-68963cca8391@github.com> Message-ID: <483uPvWB_jHFudnEhzbVbpdvT_CohG534vCNUqEvYjs=.66f3ff26-0e25-4b11-a83d-bb94a3aaac30@github.com> On Tue, 21 May 2024 16:53:38 GMT, Steve Dohrmann wrote: >> src/hotspot/cpu/x86/assembler_x86.cpp line 1668: >> >>> 1666: void Assembler::andnl(Register dst, Register src1, Register src2) { >>> 1667: assert(VM_Version::supports_bmi1(), "bit manipulation instructions not supported"); >>> 1668: assert(!needs_eevex(dst, src1, src2) || UseAPX, "extended gpr use requires UseAPX and UseAVX > 2"); >> >> Technical detail: `UseAPX and UseAVX > 2` sounds wrong. Did you mean to say "or"? Because UseAPX is only enabled when `UseAVX >= 3`. > > Thanks, see comment below. I think the confusion comes from the "or" in the code, and the "and" in the assert description. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1608681525 From duke at openjdk.org Tue May 21 17:37:33 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Tue, 21 May 2024 17:37:33 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v25] In-Reply-To: References: Message-ID: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: updated assert message ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/f054589e..d2ac410e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=23-24 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From duke at openjdk.org Tue May 21 17:37:33 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Tue, 21 May 2024 17:37:33 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v23] In-Reply-To: <483uPvWB_jHFudnEhzbVbpdvT_CohG534vCNUqEvYjs=.66f3ff26-0e25-4b11-a83d-bb94a3aaac30@github.com> References: <9CD9vJMHAwob0CCW0697u1CAsCa3WZiJ2FEKDW0tc10=.fb45c38c-dd0f-43b1-a0a4-68963cca8391@github.com> <483uPvWB_jHFudnEhzbVbpdvT_CohG534vCNUqEvYjs=.66f3ff26-0e25-4b11-a83d-bb94a3aaac30@github.com> Message-ID: On Tue, 21 May 2024 17:17:51 GMT, Emanuel Peter wrote: >> Thanks, see comment below. > > I think the confusion comes from the "or" in the code, and the "and" in the assert description. I see. I removed the "and UseAVX > 2" here. Thank you. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1608700375 From kvn at openjdk.org Tue May 21 17:44:07 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 21 May 2024 17:44:07 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v24] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 16:56:22 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > update APX warning text Few comments. ------------- PR Review: https://git.openjdk.org/jdk/pull/18476#pullrequestreview-2069187667 From kvn at openjdk.org Tue May 21 17:44:08 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 21 May 2024 17:44:08 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v23] In-Reply-To: References: <9CD9vJMHAwob0CCW0697u1CAsCa3WZiJ2FEKDW0tc10=.fb45c38c-dd0f-43b1-a0a4-68963cca8391@github.com> <483uPvWB_jHFudnEhzbVbpdvT_CohG534vCNUqEvYjs=.66f3ff26-0e25-4b11-a83d-bb94a3aaac30@github.com> Message-ID: <8m8YIhD-z4I3P3xy4exiHqWc4fYELdno93SvnrSX8Tc=.073fd1eb-c33b-4ed1-936d-7cb46b76decc@github.com> On Tue, 21 May 2024 17:34:35 GMT, Steve Dohrmann wrote: >> I think the confusion comes from the "or" in the code, and the "and" in the assert description. > > I see. I removed the "and UseAVX > 2" here. Thank you. This assert is repeated a lot. Instead of it I think `UseAPX` assert check should be done in `needs_rex2()` and `needs_eevex()` when they return `true`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1608707479 From kvn at openjdk.org Tue May 21 17:44:09 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 21 May 2024 17:44:09 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v23] In-Reply-To: References: <9CD9vJMHAwob0CCW0697u1CAsCa3WZiJ2FEKDW0tc10=.fb45c38c-dd0f-43b1-a0a4-68963cca8391@github.com> Message-ID: On Tue, 21 May 2024 16:53:33 GMT, Steve Dohrmann wrote: >> src/hotspot/cpu/x86/assembler_x86.cpp line 2036: >> >>> 2034: InstructionMark im(this); >>> 2035: if (needs_eevex(crc, adr.base(), adr.index())) { >>> 2036: assert(UseAPX, "extended gpr use requires UseAPX and UseAVX > 2"); >> >> Maybe here the "and" makes sense, but not sure. > > Yes, "and" is technically correct. APX includes instructions that require AVX 3 (evex) encoding for extended gpr use, together with +UseAPX. You don't need to say "and UseAVX >2" because you switch off `UseAPX` in `vm_version_x86.cpp` when `UseAVX < 3`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1608689933 From kvn at openjdk.org Tue May 21 17:44:10 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 21 May 2024 17:44:10 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v22] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 17:48:25 GMT, Steve Dohrmann wrote: >> src/hotspot/cpu/x86/vm_version_x86.cpp line 1005: >> >>> 1003: } >>> 1004: >>> 1005: if (UseAPX && (UseAVX < 3)) { >> >> A comment here will be helpful stating the need to disable APX functionality for non AVX512 targets, please note UseAVX is set to level 3 based on existence of CPUID (EAX=07, EBX[16] = AVX512F) bit, and future AVX10 targets may support APX. > > Thanks. I added a comment. Is it enough to have AVX512F present for APX? What about Knight CPUs which have limited AVX512 features? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1608674283 From kvn at openjdk.org Tue May 21 17:44:10 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 21 May 2024 17:44:10 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v22] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 17:11:16 GMT, Vladimir Kozlov wrote: >> Thanks. I added a comment. > > Is it enough to have AVX512F present for APX? What about Knight CPUs which have limited AVX512 features? You should add code which checks CPUID features bit to set `UseAPX`. Or set it to `false` unconditionally in this PR regardless UseAVX value with comment "APX is not supported on this CPU". Otherwise someone will switch it on command line on avx512 machine. Or we should push [#18562](https://github.com/openjdk/jdk/pull/18562) first. Which I prefer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1608681738 From vlivanov at openjdk.org Tue May 21 18:05:04 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 21 May 2024 18:05:04 GMT Subject: RFR: 8332547: Unloaded signature classes in DirectMethodHandles In-Reply-To: References: Message-ID: On Mon, 20 May 2024 21:29:20 GMT, Vladimir Ivanov wrote: > JVM routinely installs loader constraints for unloaded signature classes when method resolution takes place. MethodHandle resolution took a different route and eagerly resolves signature classes instead (see `java.lang.invoke.MemberName$Factory::resolve` and `sun.invoke.util.VerifyAccess::isTypeVisible` for details). > > There's a micro-optimization which bypasses eager resolution for `java.*` classes. The downside is that `java.*` signature classes can show up as unloaded. It manifests as inlining failures during JIT-compilation and may cause severe performance issues. > > Proposed fix removes the aforementioned special case logic during `MethodHandle` resolution. > > In some cases it may slow down `MethodHandle` construction a bit (e.g., when repeatedly constructing `DirectMethodHandle`s with lots of arguments), but `MethodHandle` construction step is not performance critical. > > Testing: hs-tier1 - hs-tier4 Class loading triggered by `Class.forName()` call is at the core of `isTypeVisible`. (The rest is fast path checks.) It's what makes `isTypeVisible` query idempotent. I can definitely name it differently (e.g, `ensureTypeVisible`), but making a separate class loading pass across signature classes doesn't make much sense. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19319#issuecomment-2123160245 From sviswanathan at openjdk.org Tue May 21 18:41:11 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 21 May 2024 18:41:11 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v20] In-Reply-To: References: Message-ID: <2y8TuEb98PH5hxKQAxPdnPfuqqkDmGDmHxS6byTZoas=.7c1f9bc9-75c6-4057-8b74-35cb1a086509@github.com> On Fri, 17 May 2024 23:47:45 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Addressing lots of comments. Interim commit. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4648: > 4646: vpxor(vec1, vec2); > 4647: > 4648: vptest(vec1, vec1); These should be only 128 bit: pxor(vec1, vec2); ptest(vec1, vec1); src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1351: > 1349: assert_different_registers(needle, needleVal); > 1350: > 1351: bool isLL = (ae == StrIntrinsicNode::LL); isLL not used in this function. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1608732591 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1605624430 From duke at openjdk.org Tue May 21 18:50:20 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Tue, 21 May 2024 18:50:20 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v26] In-Reply-To: References: Message-ID: <1gw5od-gWYF28y7_wLEgXpCv1ll37O1-GhQ07fAu6Fo=.856ca2e3-dc5b-4735-849a-a74b9a13d8ef@github.com> > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: disable UseAPX for now, move asserts to encoding check functions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/d2ac410e..1d6ecba9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=24-25 Stats: 57 lines in 2 files changed: 4 ins; 46 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From duke at openjdk.org Tue May 21 18:54:09 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Tue, 21 May 2024 18:54:09 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v24] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 17:41:14 GMT, Vladimir Kozlov wrote: >> Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: >> >> update APX warning text > > Few comments. Thanks @vnkozlov for the comments. - UseAPX is disabled for now, using the comment you suggested. - The asserts are now added to ::needs_rex2 and ::needs_eevex, and removed from the relevant instruction functions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2123246189 From jvernee at openjdk.org Tue May 21 20:17:02 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 21 May 2024 20:17:02 GMT Subject: RFR: 8332547: Unloaded signature classes in DirectMethodHandles In-Reply-To: References: Message-ID: On Tue, 21 May 2024 18:02:45 GMT, Vladimir Ivanov wrote: > I can definitely name it differently (e.g, ensureTypeVisible), but making a separate class loading pass across signature classes doesn't make much sense. Ok, in that case I suggest also renaming `MemberName::checkForTypeAlias`, maybe to `ensureTypeVisible` as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19319#issuecomment-2123368855 From dlong at openjdk.org Tue May 21 21:36:03 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 21 May 2024 21:36:03 GMT Subject: RFR: 8326615: C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v2] In-Reply-To: <6r39P_htGVom5FkgRtjngxwur1_uVG13JS33mBeghPk=.8ce1ed82-cda9-4dae-beb6-18c60024ec18@github.com> References: <6r39P_htGVom5FkgRtjngxwur1_uVG13JS33mBeghPk=.8ce1ed82-cda9-4dae-beb6-18c60024ec18@github.com> Message-ID: On Tue, 21 May 2024 09:21:37 GMT, Damon Fenacci wrote: >> # Issue >> >> The test `compiler/startup/StartupOutput.java` fails intermittently due to a crash after correctly printing the error `Initial size of CodeCache is too small` (the test limits the code cache using k-XX:InitialCodeCacheSize=1024K -XX:ReservedCodeCacheSize=1200k`). >> The appearance of the issue is very dependent on thread scheduling. The original report happens during C1 initialization but C2 initialization is affected as well. >> >> # Causes >> >> There is one occurrence during C1 initialization and one during C2 initialization where a call to `RuntimeStub::new_runtime_stub` can fail fatally if there is not enough space left. >> For C1: `Compiler::init_c1_runtime` -> `Runtime1::initialize` -> `Runtime1::generate_blob_for` -> `Runtime1::generate_blob` -> `RuntimeStub::new_runtime_stub`. >> For C2: `C2Compiler::initialize` -> `OptoRuntime::generate` -> `OptoRuntime::generate_stub` -> `Compile::Compile` -> `Compile::Code_Gen` -> `PhaseOutput::install` -> `PhaseOutput::install_stub` -> `RuntimeStub::new_runtime_stub`. >> >> # Solution >> >> https://github.com/openjdk/jdk/pull/15970 introduced an optional argument to `RuntimeStub::new_runtime_stub` to determine if it fails fatally or not. We can take advantage of it to avoid crashing and instead pass the information about the success or failure of the allocation up the (C1 and C2 initialization) call stack up to where we can set the compilations as failed. > > Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/share/c1/c1_Runtime1.cpp > > Co-authored-by: Tobias Hartmann > - Update src/hotspot/share/c1/c1_Compiler.cpp > > Co-authored-by: Tobias Hartmann src/hotspot/share/c1/c1_Runtime1.cpp line 287: > 285: #endif > 286: BarrierSetC1* bs = BarrierSet::barrier_set()->barrier_set_c1(); > 287: bs->generate_c1_runtime_stubs(blob); Don't we need to handle failures in generate_c1_runtime_stubs? With the assert removed, I think we'll get a nullptr crash. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19280#discussion_r1608976851 From kvn at openjdk.org Tue May 21 21:49:08 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 21 May 2024 21:49:08 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v26] In-Reply-To: <1gw5od-gWYF28y7_wLEgXpCv1ll37O1-GhQ07fAu6Fo=.856ca2e3-dc5b-4735-849a-a74b9a13d8ef@github.com> References: <1gw5od-gWYF28y7_wLEgXpCv1ll37O1-GhQ07fAu6Fo=.856ca2e3-dc5b-4735-849a-a74b9a13d8ef@github.com> Message-ID: On Tue, 21 May 2024 18:50:20 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > disable UseAPX for now, move asserts to encoding check functions src/hotspot/cpu/x86/assembler_x86.cpp line 6397: > 6395: > 6396: void Assembler::stmxcsr(Address dst) { > 6397: if (UseAVX > 0 && !UseAPX ) { New ` && !UseAPX` check is strange here. If `UseAPX` is `true` we will execute `} else {` part of code which was executed only for SSE (UseAVX == 0) before. Is this intentional? This needs comment explaining why we do that if it is intentional. I see in other place you have `adr.base_needs_rex2() || adr.index_needs_rex2()` check. Do we need it here too? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1608975763 From duke at openjdk.org Tue May 21 23:22:07 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Tue, 21 May 2024 23:22:07 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v26] In-Reply-To: References: <1gw5od-gWYF28y7_wLEgXpCv1ll37O1-GhQ07fAu6Fo=.856ca2e3-dc5b-4735-849a-a74b9a13d8ef@github.com> Message-ID: <_Ld0gq_PA3NTwxrGqrav5tUDc5FfwsmNgf59_J_Ofak=.2feda2eb-c6f8-4622-92f0-44e1bae63481@github.com> On Tue, 21 May 2024 21:32:11 GMT, Vladimir Kozlov wrote: >> Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: >> >> disable UseAPX for now, move asserts to encoding check functions > > src/hotspot/cpu/x86/assembler_x86.cpp line 6397: > >> 6395: >> 6396: void Assembler::stmxcsr(Address dst) { >> 6397: if (UseAVX > 0 && !UseAPX ) { > > New ` && !UseAPX` check is strange here. If `UseAPX` is `true` we will execute `} else {` part of code which was executed only for SSE (UseAVX == 0) before. Is this intentional? This needs comment explaining why we do that if it is intentional. > > I see in other place you have `adr.base_needs_rex2() || adr.index_needs_rex2()` check. Do we need it here too? The !UseAPX test was added because if UseAPX is enabled we want to support extended register use via rex2 encoding in the else clause. The existing vex encoding remains when UseAPX is not enabled. There is a needs_rex2 check of address registers in the call to ::vex_prefix, asserting if UseAPX not enabled. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1609048238 From duke at openjdk.org Tue May 21 23:57:34 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Tue, 21 May 2024 23:57:34 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v27] In-Reply-To: References: Message-ID: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: add comment to ::stmxcsr and ::ldmxcsr ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/1d6ecba9..5b6fdce0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=25-26 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From duke at openjdk.org Wed May 22 00:30:09 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Wed, 22 May 2024 00:30:09 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v26] In-Reply-To: <_Ld0gq_PA3NTwxrGqrav5tUDc5FfwsmNgf59_J_Ofak=.2feda2eb-c6f8-4622-92f0-44e1bae63481@github.com> References: <1gw5od-gWYF28y7_wLEgXpCv1ll37O1-GhQ07fAu6Fo=.856ca2e3-dc5b-4735-849a-a74b9a13d8ef@github.com> <_Ld0gq_PA3NTwxrGqrav5tUDc5FfwsmNgf59_J_Ofak=.2feda2eb-c6f8-4622-92f0-44e1bae63481@github.com> Message-ID: On Tue, 21 May 2024 23:19:04 GMT, Steve Dohrmann wrote: >> src/hotspot/cpu/x86/assembler_x86.cpp line 6397: >> >>> 6395: >>> 6396: void Assembler::stmxcsr(Address dst) { >>> 6397: if (UseAVX > 0 && !UseAPX ) { >> >> New ` && !UseAPX` check is strange here. If `UseAPX` is `true` we will execute `} else {` part of code which was executed only for SSE (UseAVX == 0) before. Is this intentional? This needs comment explaining why we do that if it is intentional. >> >> I see in other place you have `adr.base_needs_rex2() || adr.index_needs_rex2()` check. Do we need it here too? > > The !UseAPX test was added because if UseAPX is enabled we want to support extended register use via rex2 encoding in the else clause. The existing vex encoding remains when UseAPX is not enabled. There is a needs_rex2 check of address registers in the call to ::vex_prefix, asserting if UseAPX not enabled. I added a comment on this to the stmxcsr/ldmxcsr functions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1609066816 From kvn at openjdk.org Wed May 22 00:30:09 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 22 May 2024 00:30:09 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v26] In-Reply-To: References: <1gw5od-gWYF28y7_wLEgXpCv1ll37O1-GhQ07fAu6Fo=.856ca2e3-dc5b-4735-849a-a74b9a13d8ef@github.com> <_Ld0gq_PA3NTwxrGqrav5tUDc5FfwsmNgf59_J_Ofak=.2feda2eb-c6f8-4622-92f0-44e1bae63481@github.com> Message-ID: On Tue, 21 May 2024 23:54:42 GMT, Steve Dohrmann wrote: >> The !UseAPX test was added because if UseAPX is enabled we want to support extended register use via rex2 encoding in the else clause. The existing vex encoding remains when UseAPX is not enabled. There is a needs_rex2 check of address registers in the call to ::vex_prefix, asserting if UseAPX not enabled. > > I added a comment on this to the stmxcsr/ldmxcsr functions. okay ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1609081235 From kvn at openjdk.org Wed May 22 00:30:11 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 22 May 2024 00:30:11 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v27] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 23:57:34 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > add comment to ::stmxcsr and ::ldmxcsr src/hotspot/cpu/x86/assembler_x86.cpp line 13030: > 13028: } > 13029: } > 13030: if (is_map1) emit_int8(0x0F); - First. What `is_map1` means? There is no explanation for this name. May be add comment somewhere in `assembler_x86.hpp` file or use more meaningful name. - Second. You added one more byte `0x0F` for instructions even when extended registers are not used and APX is not enabled. Why? You added it in several `prefix()` and `prefixq()` methods. It can lead to regression since code size will increase. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1609089328 From sgibbons at openjdk.org Wed May 22 02:07:36 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 02:07:36 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v21] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Fixed CI compiles; re-factor UL processing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/9a861979..38868a35 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=19-20 Stats: 570 lines in 2 files changed: 327 ins; 158 del; 85 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sgibbons at openjdk.org Wed May 22 02:07:36 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 02:07:36 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v20] In-Reply-To: References: Message-ID: <2K6GTqVka0-FS4NQcZ6z6izsDZVC1DuN1GuzzpkLlZk=.3853f424-d8fc-4c65-827d-a7abb321f38e@github.com> On Fri, 17 May 2024 23:47:45 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Addressing lots of comments. Interim commit. Comment on behalf of @sviswa7 : Unclear whether `size` in `byte_compare_helper` is intended to be in bytes or in elements. Please check its consistency. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2123736900 From sgibbons at openjdk.org Wed May 22 02:07:36 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 02:07:36 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v20] In-Reply-To: <2y8TuEb98PH5hxKQAxPdnPfuqqkDmGDmHxS6byTZoas=.7c1f9bc9-75c6-4057-8b74-35cb1a086509@github.com> References: <2y8TuEb98PH5hxKQAxPdnPfuqqkDmGDmHxS6byTZoas=.7c1f9bc9-75c6-4057-8b74-35cb1a086509@github.com> Message-ID: On Tue, 21 May 2024 18:03:41 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressing lots of comments. Interim commit. > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4648: > >> 4646: vpxor(vec1, vec2); >> 4647: >> 4648: vptest(vec1, vec1); > > These should be only 128 bit: > pxor(vec1, vec2); > ptest(vec1, vec1); Fixed > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1351: > >> 1349: assert_different_registers(needle, needleVal); >> 1350: >> 1351: bool isLL = (ae == StrIntrinsicNode::LL); > > isLL not used in this function. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1609164643 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1609164578 From gcao at openjdk.org Wed May 22 02:31:16 2024 From: gcao at openjdk.org (Gui Cao) Date: Wed, 22 May 2024 02:31:16 GMT Subject: RFR: 8332533: RISC-V: Enable vector variable shift instructions for machines with RVV [v3] In-Reply-To: References: Message-ID: > Hi, I noticed the following warning in the Opto JIT Code for the Vector API in the `test/jdk/jdk/incubator/vector/Byte256VectorTests.java: ASHRByte256VectorTests` test: > > -------------------------------------------------------------------------------- > ** Rejected vector op (RShiftVB,byte,32) because architecture does not support variable vector shifts > ** not supported: arity=2 opc=405 vlen=32 etype=byte ismask=0 is_masked_op=0 > ``` > the reason is because Matcher::supports_vector_ variable_shifts returns false. the port of RISC-V Vector API now supports the vector shifts, so this should return with UseRVV. By the Way, the Matcher::supports_vector_variable_shifts function was introduced by Vector API, and I think forgot to modify the Matcher::supports_vector_variable_shifts function when implementing vector shift. > After the fix, the test passes normally and generates the Opto JIT Code such as: > > 1c2 loadV V1, [R7] # vector (rvv) > 1ca lwu R28, [R28, #12] # loadN, compressed ptr, #@loadN ! Field: jdk/internal/vm/vector/VectorSupport$VectorPayload.payload (constant) > 1ce decode_heap_oop R7, R28 #@decodeHeapOop > 1d2 addi R7, R7, #16 # ptr, #@addP_reg_imm > 1d4 loadV V2, [R7] # vector (rvv) > 1dc vand_immI V1, V1, #7 > 1e4 spill [sp, #48] -> R7 # spill size = 32 > 1e6 # castII of R7, #@castII > 1e6 vasrB V3, V2, V1 > 1fa spill [sp, #96] -> R29 # spill size = 32 > 1fc bgeu R7, R29, B101 #@cmpU_branch P=0.000001 C=-1.000000 > > > ### Testing: > qemu 8.1.50 with UseRVV: > - [x] Run tier1-3 tests (release) > - [x] Run test/jdk/jdk/incubator/vector (fastdebug) Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into JDK-8332533 - Remove constexpr in Matcher::supports_vector_variable_shifts - 8332533: RISC-V: Enable Matcher::supports_vector_variable_shifts with UseRVV ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19313/files - new: https://git.openjdk.org/jdk/pull/19313/files/0c64eee1..e8042c6d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19313&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19313&range=01-02 Stats: 2264 lines in 110 files changed: 1195 ins; 826 del; 243 mod Patch: https://git.openjdk.org/jdk/pull/19313.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19313/head:pull/19313 PR: https://git.openjdk.org/jdk/pull/19313 From thartmann at openjdk.org Wed May 22 05:24:01 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 22 May 2024 05:24:01 GMT Subject: RFR: 8332527: ZGC: generalize object cloning logic [v2] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 09:17:16 GMT, Roberto Casta?eda Lozano wrote: >> This changeset generalize the logic to produce a runtime call to clone a class instance so that it can be shared by other collectors adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). The changeset moves the logic from `ZBarrierSetC2` to the GC-shared `BarrierSetC2` class and adds support for 32-bits platforms. >> >> #### Testing >> >> - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> - tier4-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only). >> - `compiler/arraycopy` tests (linux-x86-debug) with [an additional patch](https://github.com/openjdk/jdk/commit/ddcf777894e740b8e6ddbbf8821e82a173c23ef4) that implements cloning of large class instances with a runtime clone call rather than arraycopy when using G1 (to exercise the generalized logic on a 32-bits platform). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Applied Axel's suggestions Why is `ac->is_clone_array()` known to be always false? And if so, why is there another check at line 433 of `ZBarrierSetC2::clone_at_expansion`? ------------- PR Review: https://git.openjdk.org/jdk/pull/19311#pullrequestreview-2070120702 From gcao at openjdk.org Wed May 22 06:30:03 2024 From: gcao at openjdk.org (Gui Cao) Date: Wed, 22 May 2024 06:30:03 GMT Subject: RFR: 8332533: RISC-V: Enable vector variable shift instructions for machines with RVV [v3] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 02:34:13 GMT, Fei Yang wrote: >> Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge remote-tracking branch 'upstream/master' into JDK-8332533 >> - Remove constexpr in Matcher::supports_vector_variable_shifts >> - 8332533: RISC-V: Enable Matcher::supports_vector_variable_shifts with UseRVV > > Looks good. Thanks. @RealFYang @Hamlin-Li : Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19313#issuecomment-2123962374 From gcao at openjdk.org Wed May 22 06:59:06 2024 From: gcao at openjdk.org (Gui Cao) Date: Wed, 22 May 2024 06:59:06 GMT Subject: Integrated: 8332533: RISC-V: Enable vector variable shift instructions for machines with RVV In-Reply-To: References: Message-ID: On Mon, 20 May 2024 15:23:26 GMT, Gui Cao wrote: > Hi, I noticed the following warning in the Opto JIT Code for the Vector API in the `test/jdk/jdk/incubator/vector/Byte256VectorTests.java: ASHRByte256VectorTests` test: > > -------------------------------------------------------------------------------- > ** Rejected vector op (RShiftVB,byte,32) because architecture does not support variable vector shifts > ** not supported: arity=2 opc=405 vlen=32 etype=byte ismask=0 is_masked_op=0 > ``` > the reason is because Matcher::supports_vector_ variable_shifts returns false. the port of RISC-V Vector API now supports the vector shifts, so this should return with UseRVV. By the Way, the Matcher::supports_vector_variable_shifts function was introduced by Vector API, and I think forgot to modify the Matcher::supports_vector_variable_shifts function when implementing vector shift. > After the fix, the test passes normally and generates the Opto JIT Code such as: > > 1c2 loadV V1, [R7] # vector (rvv) > 1ca lwu R28, [R28, #12] # loadN, compressed ptr, #@loadN ! Field: jdk/internal/vm/vector/VectorSupport$VectorPayload.payload (constant) > 1ce decode_heap_oop R7, R28 #@decodeHeapOop > 1d2 addi R7, R7, #16 # ptr, #@addP_reg_imm > 1d4 loadV V2, [R7] # vector (rvv) > 1dc vand_immI V1, V1, #7 > 1e4 spill [sp, #48] -> R7 # spill size = 32 > 1e6 # castII of R7, #@castII > 1e6 vasrB V3, V2, V1 > 1fa spill [sp, #96] -> R29 # spill size = 32 > 1fc bgeu R7, R29, B101 #@cmpU_branch P=0.000001 C=-1.000000 > > > ### Testing: > qemu 8.1.50 with UseRVV: > - [x] Run tier1-3 tests (release) > - [x] Run test/jdk/jdk/incubator/vector (fastdebug) This pull request has now been integrated. Changeset: 67f03f2a Author: Gui Cao Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/67f03f2a4f5ac12748ffbf5c04f248a60869e180 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8332533: RISC-V: Enable vector variable shift instructions for machines with RVV Reviewed-by: fyang, mli ------------- PR: https://git.openjdk.org/jdk/pull/19313 From aboldtch at openjdk.org Wed May 22 07:02:06 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 22 May 2024 07:02:06 GMT Subject: RFR: 8332527: ZGC: generalize object cloning logic [v2] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 09:17:16 GMT, Roberto Casta?eda Lozano wrote: >> This changeset generalize the logic to produce a runtime call to clone a class instance so that it can be shared by other collectors adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). The changeset moves the logic from `ZBarrierSetC2` to the GC-shared `BarrierSetC2` class and adds support for 32-bits platforms. >> >> #### Testing >> >> - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> - tier4-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only). >> - `compiler/arraycopy` tests (linux-x86-debug) with [an additional patch](https://github.com/openjdk/jdk/commit/ddcf777894e740b8e6ddbbf8821e82a173c23ef4) that implements cloning of large class instances with a runtime clone call rather than arraycopy when using G1 (to exercise the generalized logic on a 32-bits platform). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Applied Axel's suggestions As @TobiHartmann points out the `ary_ptr != nullptr` must be understood. ------------- Changes requested by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19311#pullrequestreview-2070299590 From aboldtch at openjdk.org Wed May 22 07:02:07 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 22 May 2024 07:02:07 GMT Subject: RFR: 8332527: ZGC: generalize object cloning logic [v2] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 05:21:54 GMT, Tobias Hartmann wrote: > Why is `ac->is_clone_array()` known to be always false? And if so, why is there another check at line 433 of `ZBarrierSetC2::clone_at_expansion`? Good catch. 'ac' in `clone_at_expansion` can be for either. It, as you point out, condition what it does based on if this is a instance clone or an array clone. https://github.com/openjdk/jdk/blob/cf85edeca422b52279df7dcba1675b5dbf5c4e75/src/hotspot/share/gc/z/c2/zBarrierSetC2.cpp#L416 `clone_instance_in_runtime` should only be called when cloning an instance. So in this context the is_array must be false. But it does seem strange that `ary_ptr != nullptr` is also in the condition. We could incorrectly call `clone_instance_in_runtime` if `ary_ptr == nullptr`. But I am not sure what that means. It must be understood for this abstraction to make sense. If `ary_ptr != nullptr` is invariant then the code should be changed to: - if (ac->is_clone_array() && ary_ptr != nullptr) { + if (ac->is_clone_array()) { +. assert(ary_ptr != nullptr, "invariant"); and if it is not invariant. Then simply calling `clone_instance_in_runtime` is wrong. It must either be changed (and renamed) to handle `CloneArray.` or `clone_at_expansion` must do something else for that case. Similarly I assume that the `ArrayCopyNode` is either `CloneInstance` or `CloneArray.` But there is also the `CloneOopArray` which I assume is not used given that we have the following code for `CloneArray`. https://github.com/openjdk/jdk/blob/cf85edeca422b52279df7dcba1675b5dbf5c4e75/src/hotspot/share/gc/z/c2/zBarrierSetC2.cpp#L416-L420 ------------- PR Comment: https://git.openjdk.org/jdk/pull/19311#issuecomment-2124010965 From gcao at openjdk.org Wed May 22 07:11:29 2024 From: gcao at openjdk.org (Gui Cao) Date: Wed, 22 May 2024 07:11:29 GMT Subject: RFR: 8332615: RISC-V: Support vector unsigned comparison instructions for machines with RVV Message-ID: Hi, I noticed the following warning in the Opto JIT Code for the Vector API in the `test/jdk/jdk/incubator/vector/Int256VectorTests.java: UNSIGNED_LTInt256VectorTests` test: ** not supported: unsigned comparison op=comp/1 vlen=8 etype=int ismask=usestore ``` After this Patch, We supports vector unsigned comparison instructions, the test passes normally and generates the Opto JIT Code such as: 23e B46: # out( B48 B47 ) <- in( B25 B45 ) Loop( B46-B45 ) Freq: 955.829 23e addw R24, R29, zr #@convI2L_reg_reg 242 slli R30, R24, (#2 & 0x3f) #@lShiftL_reg_imm 246 add R17, R8, R30 # ptr, #@addP_reg_reg 24a add R19, R9, R30 # ptr, #@addP_reg_reg 24e addi R30, R17, #16 # ptr, #@addP_reg_imm 252 addi R31, R19, #16 # ptr, #@addP_reg_imm 256 loadV V1, [R30] # vector (rvv) 25e loadV V2, [R31] # vector (rvv) 266 vmaskcmp V0, V1, V2, #19 272 vmask_tolong R20, V0 280 vstoremask V1, V0 # elem size is #4 byte[s] 28c lw R31, [R17, #16] # int, #@loadI 290 lw R11, [R19, #16] # int, #@loadI 294 andi R10, R20, #1 #@andL_reg_imm 298 bne R10, zr, B48 #@cmpL_reg_imm0_branch P=0.669978 C=14596.000000 ### Testing: qemu 8.1.50 with UseRVV: - [ ] Run tier1-3 tests (release) - [x] Run test/jdk/jdk/incubator/vector (fastdebug) ------------- Commit messages: - 8332615: RISC-V: Enable vector unsigned comparison instructions for machines with RVV Changes: https://git.openjdk.org/jdk/pull/19328/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19328&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332615 Stats: 6 lines in 2 files changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19328.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19328/head:pull/19328 PR: https://git.openjdk.org/jdk/pull/19328 From thartmann at openjdk.org Wed May 22 07:58:07 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 22 May 2024 07:58:07 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v12] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 13:47:32 GMT, Damon Fenacci wrote: >> # Issue >> When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. >> >> # Causes >> On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. >> The same is true for `StoreVector`s. >> When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 >> >> where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor offsets interfere. >> Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 >> >> but we don?t make sure that there are no masks or offsets. >> A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and offsets here too but in this case we can include these cases if the masks and offsets of the vector stores are equivalent. >> >> # Solution >> To avoid folding `Load`- and `StoreVector`s with masks and offsets we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns -1. In this way, the checks in `MemNode::can_see_stored_value` >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 >> >> and `StoreNode::Identity` >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 >> >> will fail if masks or offsets are used. >> For 2 stores of the same value we instead check for mask and offset equality. >> >> Regression tests for all versions of `Load/StoreVectorGather/Masked` hav... > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8325520: add override keyword Nice regression test! I added a few comments. src/hotspot/share/opto/vectornode.hpp line 893: > 891: class LoadVectorGatherNode : public LoadVectorNode { > 892: public: > 893: enum { Offsets = 3 }; Just wondering about the naming convention here because the corresponding constructor argument is called `indices`. Should that be consistent? src/hotspot/share/opto/vectornode.hpp line 916: > 914: virtual int store_Opcode() const { > 915: // Ensure it is different from any store opcode > 916: return -1; Maybe improve this comment and explain that we are doing this to avoid folding which does not account for the mask/offsets. Same for the comments in the other `store_Opcode` methods. src/hotspot/share/opto/vectornode.hpp line 1007: > 1005: class LoadVectorMaskedNode : public LoadVectorNode { > 1006: public: > 1007: enum { Mask = 3 }; Where is this used? src/hotspot/share/opto/vectornode.hpp line 1034: > 1032: enum { Offsets = 3, > 1033: Mask > 1034: }; Where is this used? ------------- Changes requested by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18347#pullrequestreview-2070390543 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1609457204 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1609467759 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1609470581 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1609470073 From thartmann at openjdk.org Wed May 22 07:58:08 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 22 May 2024 07:58:08 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v12] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 07:42:28 GMT, Tobias Hartmann wrote: >> Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8325520: add override keyword > > src/hotspot/share/opto/vectornode.hpp line 893: > >> 891: class LoadVectorGatherNode : public LoadVectorNode { >> 892: public: >> 893: enum { Offsets = 3 }; > > Just wondering about the naming convention here because the corresponding constructor argument is called `indices`. Should that be consistent? Also, where is this used? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1609471219 From fyang at openjdk.org Wed May 22 08:27:02 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 22 May 2024 08:27:02 GMT Subject: RFR: 8332615: RISC-V: Support vector unsigned comparison instructions for machines with RVV In-Reply-To: References: Message-ID: On Tue, 21 May 2024 14:19:52 GMT, Gui Cao wrote: > Hi, I noticed the following warning in the Opto JIT Code for the Vector API in the `test/jdk/jdk/incubator/vector/Int256VectorTests.java: UNSIGNED_LTInt256VectorTests` test: > > ** not supported: unsigned comparison op=comp/1 vlen=8 etype=int ismask=usestore > ``` > After this Patch, We supports vector unsigned comparison instructions, the test passes normally and generates the Opto JIT Code such as: > > 23e B46: # out( B48 B47 ) <- in( B25 B45 ) Loop( B46-B45 ) Freq: 955.829 > 23e addw R24, R29, zr #@convI2L_reg_reg > 242 slli R30, R24, (#2 & 0x3f) #@lShiftL_reg_imm > 246 add R17, R8, R30 # ptr, #@addP_reg_reg > 24a add R19, R9, R30 # ptr, #@addP_reg_reg > 24e addi R30, R17, #16 # ptr, #@addP_reg_imm > 252 addi R31, R19, #16 # ptr, #@addP_reg_imm > 256 loadV V1, [R30] # vector (rvv) > 25e loadV V2, [R31] # vector (rvv) > 266 vmaskcmp V0, V1, V2, #19 > 272 vmask_tolong R20, V0 > 280 vstoremask V1, V0 # elem size is #4 byte[s] > 28c lw R31, [R17, #16] # int, #@loadI > 290 lw R11, [R19, #16] # int, #@loadI > 294 andi R10, R20, #1 #@andL_reg_imm > 298 bne R10, zr, B48 #@cmpL_reg_imm0_branch P=0.669978 C=14596.000000 > > ### Testing: > qemu 8.1.50 with UseRVV: > - [ ] Run tier1-3 tests (release) > - [x] Run test/jdk/jdk/incubator/vector (fastdebug) Looks good. Thanks! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19328#pullrequestreview-2070498642 From fgao at openjdk.org Wed May 22 08:45:03 2024 From: fgao at openjdk.org (Fei Gao) Date: Wed, 22 May 2024 08:45:03 GMT Subject: RFR: 8320622: [TEST] Improve coverage of compiler/loopopts/superword/TestMulAddS2I.java on different platforms In-Reply-To: References: Message-ID: On Tue, 21 May 2024 16:39:28 GMT, Vladimir Kozlov wrote: >> It would be worthwhile to improve the test coverage on all platforms by applying another common VM flag. > > Good. Thanks for your review @vnkozlov @eme64 @offamitkumar . I'm going to integrate it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19305#issuecomment-2124209657 From mli at openjdk.org Wed May 22 09:03:03 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 22 May 2024 09:03:03 GMT Subject: RFR: 8320999: RISC-V: C2 RotateLeftV In-Reply-To: References: Message-ID: <72VYJDMwIFJBvkzmEQIaSNPuIfoPYmL5lyCmqiHIoJw=.666ecb52-9157-4534-8b55-4cf43133a5a0@github.com> On Tue, 21 May 2024 11:51:00 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > More detailed description is inline in the code. > Thanks I'll need to refine the patch a bit, seems imm in vror.vi is 6 bits rather than 5 bits which is the case in basic vector instructions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19325#issuecomment-2124243794 From fgao at openjdk.org Wed May 22 11:36:09 2024 From: fgao at openjdk.org (Fei Gao) Date: Wed, 22 May 2024 11:36:09 GMT Subject: Integrated: 8320622: [TEST] Improve coverage of compiler/loopopts/superword/TestMulAddS2I.java on different platforms In-Reply-To: References: Message-ID: <5_0-PikptoAivHewnKh_kmpvIbdXfoQy6NzoM-XQarQ=.acd73bb8-2695-453e-9c57-3a2140f8fe7b@github.com> On Mon, 20 May 2024 08:56:35 GMT, Fei Gao wrote: > It would be worthwhile to improve the test coverage on all platforms by applying another common VM flag. This pull request has now been integrated. Changeset: 8a9d77d5 Author: Fei Gao URL: https://git.openjdk.org/jdk/commit/8a9d77d58de259b6b2bdc2cc9e7bfdc28dcf7165 Stats: 7 lines in 1 file changed: 0 ins; 4 del; 3 mod 8320622: [TEST] Improve coverage of compiler/loopopts/superword/TestMulAddS2I.java on different platforms Reviewed-by: epeter, kvn ------------- PR: https://git.openjdk.org/jdk/pull/19305 From luhenry at openjdk.org Wed May 22 14:18:11 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 22 May 2024 14:18:11 GMT Subject: RFR: 8320999: RISC-V: C2 RotateLeftV In-Reply-To: References: Message-ID: <7sd_3bkzxUEXjHNVpHhnjVLRDu1J_VlTPfC5ZQPjxAM=.d53dc43c-6e28-4b6b-8628-3ee050780885@github.com> On Tue, 21 May 2024 11:51:00 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > More detailed description is inline in the code. > Thanks src/hotspot/cpu/riscv/assembler_riscv.hpp line 1887: > 1885: > 1886: // Vector Bit-manipulation used in Cryptography (Zvbb) Extension > 1887: INSN(vrol_vx, 0b1010111, 0b100, 0b010101); we are not using `vrol_vx` anywhere. src/hotspot/cpu/riscv/assembler_riscv.hpp line 1899: > 1897: > 1898: // Vector Bit-manipulation used in Cryptography (Zvbb) Extension > 1899: INSN(vror_vi, 0b1010111, 0b011, 0b010100); I'm assuming there is not `vrol_vi`? It would be worth leaving a small comment here like // There is no `vrol_vi` instruction. src/hotspot/cpu/riscv/matcher_riscv.hpp line 132: > 130: // Does the CPU supports vector variable shift instructions? > 131: static constexpr bool supports_vector_variable_shifts(void) { > 132: return true; What's the path to checking for `UseZvbb` and `UseZvbc` respectively to the specific instruction? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19325#discussion_r1610063874 PR Review Comment: https://git.openjdk.org/jdk/pull/19325#discussion_r1610059606 PR Review Comment: https://git.openjdk.org/jdk/pull/19325#discussion_r1610046593 From sgibbons at openjdk.org Wed May 22 14:26:18 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 14:26:18 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v6] In-Reply-To: References: Message-ID: <-IZk0dL-Bd2Gp5zsI3DSsHzNl6-6lB_8HRd4KkBUALw=.0ee706a8-9281-40f8-a0ba-d53385edcdcf@github.com> On Tue, 9 Jan 2024 15:06:10 GMT, Emanuel Peter wrote: >> Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: >> >> - Merge branch 'openjdk:master' into indexof >> - Addressing review comments. >> - Fix for JDK-8321599 >> - Support UU IndexOf >> - Only use optimization when EnableX86ECoreOpts is true >> - Fix whitespace >> - Merge branch 'openjdk:master' into indexof >> - Comments; added exhaustive-ish test >> - Subtracting 0x10 twice. >> - Stomped on r13 in switch branch calculation >> - ... and 11 more: https://git.openjdk.org/jdk/compare/8a4dc79e...600377b0 > > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1608: > >> 1606: // vector compares when size is 2 * VEC_SIZE or less. 38 8. Use 4 >> 1607: // vector compares when size is 4 * VEC_SIZE or less. 39 9. Use 8 >> 1608: // vector compares when size is 8 * VEC_SIZE or less. */ > > Is this formatting intended? Fixed > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1672: > >> 1670: >> 1671: // 98 VPCMPEQ VEC_SIZE(%rdi), %ymm2, %ymm2 >> 1672: // 99 vpmovmskb %ymm2, %eax > > It seems that here the comments and code is strangely interleaved / shifted. What is this all for? All this has been remedied > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 2301: > >> 2299: // 388 setg %dl >> 2300: // 389 leal -1(%rdx, %rdx), %eax >> 2301: __ movzbl(rcx, Address(rsi, rax, Address::times_1, -0x20)); > > Down here it is even worse All this has been remedied ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610074501 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610076284 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610076661 From yzheng at openjdk.org Wed May 22 14:31:16 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 22 May 2024 14:31:16 GMT Subject: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v5] In-Reply-To: References: Message-ID: <1nU6OzVHKjN_v9tJD4vTnoQa6hTn5CgDF15PQsyr5YE=.ed74dc2b-33f8-4828-a730-43f03a9aa4ab@github.com> On Wed, 17 Apr 2024 19:33:01 GMT, Dean Long wrote: >> Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: >> >> address comment. > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4670: > >> 4668: const Register tmp5 = r15; >> 4669: const Register tmp6 = r16; >> 4670: const Register tmp7 = r17; > > Why not minimize changes and continue to use r5 for tmp0? I see no need for r17 or to reassign all the other tmp registers. Was attempting to align the suffixes. Will revert. > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4693: > >> 4691: const Register xlen = r1; >> 4692: const Register z = r2; >> 4693: const Register zlen = r3; > > LibraryCallKit::inline_squareToLen() is still computing zlen and passing it as the 4th arg, even though the value is unused. ppc x86 are not using `multiply_to_len` for `generate_squareToLen`. I think we still need to pass zlen for these platforms. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1610083476 PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1610088021 From duke at openjdk.org Wed May 22 14:38:22 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Wed, 22 May 2024 14:38:22 GMT Subject: RFR: 8317721: RISC-V: Implement CRC32 intrinsic [v12] In-Reply-To: References: Message-ID: > Hi everyone! Please review this port of [AArch64](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4224) `_updateBytesCRC32`, `_updateByteBufferCRC32` and `_updateCRC32` intrinsics. This patch introduces only the plain (non-vectorized, no Zbc) version. > > ### Correctness checks > > Tier 1/2 tests are ok. > > ### Performance results on T-Head board > > #### Results for enabled intrinsic: > > Used test is `test/micro/org/openjdk/bench/java/util/TestCRC32.java` > > | Benchmark | (count) | Mode | Cnt | Score | Error | Units | > | --- | ---- | ----- | --- | ---- | --- | ---- | > | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 24 | 3730.929 | 37.773 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 24 | 2126.673 | 2.032 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 24 | 1134.330 | 6.714 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 24 | 584.017 | 2.267 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 24 | 151.173 | 0.346 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 24 | 19.113 | 0.008 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 24 | 4.647 | 0.022 | ops/ms | > > #### Results for disabled intrinsic: > > | Benchmark | (count) | Mode | Cnt | Score | Error | Units | > | --------------------------------------------------- | ---------- | --------- | ---- | ----------- | --------- | ---------- | > | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 15 | 798.365 | 35.486 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 15 | 677.756 | 46.619 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 15 | 552.781 | 27.143 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 15 | 429.304 | 12.518 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 15 | 166.738 | 0.935 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 15 | 25.060 | 0.034 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 15 | 6.196 | 0.030 | ops/ms | ArsenyBochkarev has updated the pull request incrementally with two additional commits since the last revision: - Use shifts instead of ands, reschedule instructions - Make intrinsic Zba-exclusive ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17046/files - new: https://git.openjdk.org/jdk/pull/17046/files/36b96465..95752910 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17046&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17046&range=10-11 Stats: 36 lines in 4 files changed: 8 ins; 9 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/17046.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17046/head:pull/17046 PR: https://git.openjdk.org/jdk/pull/17046 From duke at openjdk.org Wed May 22 14:38:22 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Wed, 22 May 2024 14:38:22 GMT Subject: RFR: 8317721: RISC-V: Implement CRC32 intrinsic [v11] In-Reply-To: <-TES7hSQnM_OOxP1-Uxt1k6e3lCRaDsj9V0mSYfnz3Y=.8caa70c2-7831-468c-a047-d03f08a93623@github.com> References: <-TES7hSQnM_OOxP1-Uxt1k6e3lCRaDsj9V0mSYfnz3Y=.8caa70c2-7831-468c-a047-d03f08a93623@github.com> Message-ID: On Tue, 2 Apr 2024 16:07:27 GMT, ArsenyBochkarev wrote: >> Hi everyone! Please review this port of [AArch64](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4224) `_updateBytesCRC32`, `_updateByteBufferCRC32` and `_updateCRC32` intrinsics. This patch introduces only the plain (non-vectorized, no Zbc) version. >> >> ### Correctness checks >> >> Tier 1/2 tests are ok. >> >> ### Performance results on T-Head board >> >> #### Results for enabled intrinsic: >> >> Used test is `test/micro/org/openjdk/bench/java/util/TestCRC32.java` >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | --- | ---- | ----- | --- | ---- | --- | ---- | >> | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 24 | 3730.929 | 37.773 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 24 | 2126.673 | 2.032 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 24 | 1134.330 | 6.714 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 24 | 584.017 | 2.267 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 24 | 151.173 | 0.346 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 24 | 19.113 | 0.008 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 24 | 4.647 | 0.022 | ops/ms | >> >> #### Results for disabled intrinsic: >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | --------------------------------------------------- | ---------- | --------- | ---- | ----------- | --------- | ---------- | >> | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 15 | 798.365 | 35.486 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 15 | 677.756 | 46.619 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 15 | 552.781 | 27.143 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 15 | 429.304 | 12.518 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 15 | 166.738 | 0.935 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 15 | 25.060 | 0.034 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 15 | 6.196 | 0.030 | ops/ms | > > ArsenyBochkarev has updated the pull request incrementally with two additional commits since the last revision: > > - Schedule instructions better > - Fix crc32.h path Hello again everyone! I made two changes to this PR: 1. Now the intrinsic is Zba-exclusive. 2. I used the `slli` + `srliw` combination instead of `srli` + `andi` and rescheduled them a bit, making use of `t6` register. Current results on StarFive VisionFive2 (with Zba) are: | Benchmark | (count) | Mode | Cnt | Score | Error | Units | | -------------------------------------------------- | ------- | ------- | --- | ------------ | ------- | ----------- | | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 12 | 4644.129 | 9.566 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 12 | 2911.927 | 12.866 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 12 | 1538.630 | 5.463 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 12 | 799.100 | 3.216 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 12 | 205.947 | 0.236 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 12 | 25.880 | 0.069 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 12 | 6.022 | 0.020 | ops/ms | Results for disabled intrinsic on VisionFive2 (taken from [here](https://github.com/openjdk/jdk/pull/17046#issuecomment-1850364667)) | Benchmark | (count) | Mode | Cnt | Score | Error | Units | | ------------------------------- | ------------ | --------- | ----- | ---------| ---------- | ------ | | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 12 | 1390.530 | 42.217 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 12 | 1109.742 | 24.201 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 12 | 805.345 | 12.155 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 12 | 520.965 | 5.651 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 12 | 169.591 | 0.747 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 12 | 22.624 | 0.139 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 12 | 5.430 | 0.016 | ops/ms | ------------- PR Comment: https://git.openjdk.org/jdk/pull/17046#issuecomment-2124965284 From dfenacci at openjdk.org Wed May 22 14:43:25 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 22 May 2024 14:43:25 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v12] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 07:52:40 GMT, Tobias Hartmann wrote: >> src/hotspot/share/opto/vectornode.hpp line 893: >> >>> 891: class LoadVectorGatherNode : public LoadVectorNode { >>> 892: public: >>> 893: enum { Offsets = 3 }; >> >> Just wondering about the naming convention here because the corresponding constructor argument is called `indices`. Should that be consistent? > > Also, where is this used? You're right, they are always called "indices" even in the Vector API documentation. I've renamed them: offsets -> indices. I've also removed all unused enums (in `LoadVector...Node` classes). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1610123425 From dfenacci at openjdk.org Wed May 22 14:43:24 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 22 May 2024 14:43:24 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v13] In-Reply-To: References: Message-ID: > # Issue > When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. > > # Causes > On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. > The same is true for `StoreVector`s. > When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 > > where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor offsets interfere. > Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > but we don?t make sure that there are no masks or offsets. > A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and offsets here too but in this case we can include these cases if the masks and offsets of the vector stores are equivalent. > > # Solution > To avoid folding `Load`- and `StoreVector`s with masks and offsets we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns -1. In this way, the checks in `MemNode::can_see_stored_value` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 > > and `StoreNode::Identity` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > will fail if masks or offsets are used. > For 2 stores of the same value we instead check for mask and offset equality. > > Regression tests for all versions of `Load/StoreVectorGather/Masked` have been added too. Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8325520: remove unused enums, renamed offsets->indices ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18347/files - new: https://git.openjdk.org/jdk/pull/18347/files/df3a49ae..8f341cd6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=11-12 Stats: 201 lines in 3 files changed: 0 ins; 5 del; 196 mod Patch: https://git.openjdk.org/jdk/pull/18347.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18347/head:pull/18347 PR: https://git.openjdk.org/jdk/pull/18347 From dfenacci at openjdk.org Wed May 22 14:43:26 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 22 May 2024 14:43:26 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v12] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 07:52:12 GMT, Tobias Hartmann wrote: >> Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8325520: add override keyword > > src/hotspot/share/opto/vectornode.hpp line 1007: > >> 1005: class LoadVectorMaskedNode : public LoadVectorNode { >> 1006: public: >> 1007: enum { Mask = 3 }; > > Where is this used? Removed. > src/hotspot/share/opto/vectornode.hpp line 1034: > >> 1032: enum { Offsets = 3, >> 1033: Mask >> 1034: }; > > Where is this used? Removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1610124608 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1610124322 From yzheng at openjdk.org Wed May 22 14:47:43 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 22 May 2024 14:47:43 GMT Subject: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v6] In-Reply-To: References: Message-ID: > Moving array construction within BigInteger.implMultiplyToLen intrinsic candidate to its caller simplifies the intrinsic implementation in JIT compiler. Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: address comments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18226/files - new: https://git.openjdk.org/jdk/pull/18226/files/72ba58ce..7c6023f8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18226&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18226&range=04-05 Stats: 24 lines in 2 files changed: 0 ins; 0 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/18226.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18226/head:pull/18226 PR: https://git.openjdk.org/jdk/pull/18226 From yzheng at openjdk.org Wed May 22 14:47:43 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 22 May 2024 14:47:43 GMT Subject: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v3] In-Reply-To: References: Message-ID: On Mon, 20 May 2024 10:41:36 GMT, Bhavana Kilambi wrote: >> @dafedafe @dean-long please take a look and let me know if there are further issues, thanks! > > Hi @mur47x111, do you happen to have any performance results with this patch? @Bhavana-Kilambi the performance result for x86 is at https://github.com/openjdk/jdk/pull/18226#issuecomment-2007922439 . It is expected to be negligible. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18226#issuecomment-2124984579 From yzheng at openjdk.org Wed May 22 14:47:43 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 22 May 2024 14:47:43 GMT Subject: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v5] In-Reply-To: References: Message-ID: On Wed, 17 Apr 2024 20:04:44 GMT, Dean Long wrote: >> Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: >> >> address comment. > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 6662: > >> 6660: push(tmp5); >> 6661: >> 6662: push(xlen); > > There may be an opportunity here (separate RFE?) to get rid of the save/restore for these. I don't think it's necessary if this is called as part of a C2 stub. In the Graal port we did get rid of these save/restore. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1610126799 From thartmann at openjdk.org Wed May 22 14:51:45 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 22 May 2024 14:51:45 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v14] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 14:48:52 GMT, Damon Fenacci wrote: >> # Issue >> When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. >> >> # Causes >> On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. >> The same is true for `StoreVector`s. >> When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 >> >> where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor offsets interfere. >> Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 >> >> but we don?t make sure that there are no masks or offsets. >> A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and offsets here too but in this case we can include these cases if the masks and offsets of the vector stores are equivalent. >> >> # Solution >> To avoid folding `Load`- and `StoreVector`s with masks and offsets we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns -1. In this way, the checks in `MemNode::can_see_stored_value` >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 >> >> and `StoreNode::Identity` >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 >> >> will fail if masks or offsets are used. >> For 2 stores of the same value we instead check for mask and offset equality. >> >> Regression tests for all versions of `Load/StoreVectorGather/Masked` hav... > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8325520: improve store_Opcode comments That looks good to me, thanks for adjusting! ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18347#pullrequestreview-2071507896 From dfenacci at openjdk.org Wed May 22 14:51:44 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 22 May 2024 14:51:44 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v14] In-Reply-To: References: Message-ID: > # Issue > When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. > > # Causes > On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. > The same is true for `StoreVector`s. > When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 > > where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor offsets interfere. > Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > but we don?t make sure that there are no masks or offsets. > A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and offsets here too but in this case we can include these cases if the masks and offsets of the vector stores are equivalent. > > # Solution > To avoid folding `Load`- and `StoreVector`s with masks and offsets we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns -1. In this way, the checks in `MemNode::can_see_stored_value` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 > > and `StoreNode::Identity` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > will fail if masks or offsets are used. > For 2 stores of the same value we instead check for mask and offset equality. > > Regression tests for all versions of `Load/StoreVectorGather/Masked` have been added too. Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8325520: improve store_Opcode comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18347/files - new: https://git.openjdk.org/jdk/pull/18347/files/8f341cd6..50a75ffe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=12-13 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18347.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18347/head:pull/18347 PR: https://git.openjdk.org/jdk/pull/18347 From epeter at openjdk.org Wed May 22 14:51:45 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 22 May 2024 14:51:45 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v12] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 14:09:21 GMT, Damon Fenacci wrote: >> Looks good now! Thanks for all the updates, I think now the fix looks really concise ? >> >>> we add a specific store_Opcode method to LoadVectorGatherNode, LoadVectorMaskedNode and LoadVectorGatherMaskedNode that doesn?t return a store opcode but instead returns its own (to avoid ever being the same as a store node). >> >> This part in the PR description could be updated: now we return `-1` for those that we think are not "comparable". > >> This part in the PR description could be updated: now we return `-1` for those that we think are not "comparable". > > You're right. Fixed. > Thanks a lot for the review @eme64!! @dafedafe I recommend that you update the Bug title. It names "offsets", but really you probably want to name "indices and mask", or just "masked and gather/scatter vector operations". And it also is not just about "loads", but also "stores", correct? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18347#issuecomment-2124985757 From dfenacci at openjdk.org Wed May 22 14:51:45 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 22 May 2024 14:51:45 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v12] In-Reply-To: References: Message-ID: <4lBVgEB_CgvRVDHujJDhHxItcV9alb62Ejzhk0ErqIw=.ae367575-8dc4-4c47-ad3a-89b75c212e7f@github.com> On Wed, 22 May 2024 07:50:12 GMT, Tobias Hartmann wrote: >> Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8325520: add override keyword > > src/hotspot/share/opto/vectornode.hpp line 916: > >> 914: virtual int store_Opcode() const { >> 915: // Ensure it is different from any store opcode >> 916: return -1; > > Maybe improve this comment and explain that we are doing this to avoid folding which does not account for the mask/offsets. Same for the comments in the other `store_Opcode` methods. Comments updated. Thanks @TobiHartmann! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1610136728 From sgibbons at openjdk.org Wed May 22 14:53:16 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 14:53:16 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v13] In-Reply-To: References: Message-ID: On Mon, 26 Feb 2024 14:50:30 GMT, Jatin Bhateja wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed some review coments; replaced hard-coded registers with descriptive names. > > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 303: > >> 301: __ subq(rdi, rax); >> 302: __ movq(rdx, rdi); >> 303: __ andq(rdx, -16); > > Hi @asgibbons , may I request you to please use meaningful names instead of directly using actual GPR names to ease the review process. Done. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 777: > >> 775: __ movq(rax, rbx); >> 776: __ movq(rbx, r14); >> 777: __ leaq(r15, Address(r12, -0x2)); > > Kindly use semantically meaningful names instead of direct GPR names. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610121347 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610121724 From sgibbons at openjdk.org Wed May 22 14:53:17 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 14:53:17 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: <8ifsYHB0SLuD1ZbWhMWmBZn_UjW-iNpXrmsIkZFUczg=.ce670add-3afb-48be-8c81-2fd462d19bbd@github.com> On Mon, 6 May 2024 23:19:07 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Rearrange; add lambdas for clarity > > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 329: > >> 327: //////////////////////////////////////////////////////////////////////////////////////// >> 328: >> 329: __ bind(L_begin); > > So far we have handled haystack <= 32 and needle_size <= 5 (?) in bytes. A high level algorithm description here is needed in comments to follow the code below. A description of what are the various paths in terms of haystack and needle sizes and how to reason the assembly code below and make sure that all the paths are taken care of. Also the abstraction level suddenly changes here to detailed code below instead of methods for the various paths. I added a description. Can you please check to ensure it meets your objective? Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610124233 From sgibbons at openjdk.org Wed May 22 14:53:26 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 14:53:26 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v7] In-Reply-To: <0XxCusssrDiiKzXBfdsY1XHkv9T6mJwJe7dwFz5Uy-I=.3325e496-5bf1-4a79-8969-e28e018b77db@github.com> References: <0XxCusssrDiiKzXBfdsY1XHkv9T6mJwJe7dwFz5Uy-I=.3325e496-5bf1-4a79-8969-e28e018b77db@github.com> Message-ID: On Tue, 16 Jan 2024 13:26:15 GMT, Jatin Bhateja wrote: >> Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: >> >> - Merge branch 'openjdk:master' into indexof >> - Merge branch 'openjdk:master' into indexof >> - Addressing review comments. >> - Fix for JDK-8321599 >> - Support UU IndexOf >> - Only use optimization when EnableX86ECoreOpts is true >> - Fix whitespace >> - Merge branch 'openjdk:master' into indexof >> - Comments; added exhaustive-ish test >> - Subtracting 0x10 twice. >> - ... and 12 more: https://git.openjdk.org/jdk/compare/8e12053e...3e58d0c2 > > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 417: > >> 415: __ cmpl(Address(rbx, r15, Address::times_1, -0x14), rax); >> 416: __ jne(L_top_loop_1); >> 417: __ jmp(L_0x406019); > > For cases which are multiple of 4 bytes we can use VMASKMOVPS (conditional moves) and VPTEST. Not sure what you mean here. Could you elaborate (although it may be moot after all the changes)? > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1526: > >> 1524: __ movq(rdx, r8); >> 1525: __ movq(rcx, r9); >> 1526: #endif > > Can we spill them into XXMs, to save costly stack operations. Changed. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1545: > >> 1543: // return 0; >> 1544: // } >> 1545: __ movq(r12, rcx); > > Check for K == 0 should use rsi. k is needle length, which is rcx. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1545: > >> 1543: // return 0; >> 1544: // } >> 1545: __ movq(r12, rcx); > > Kindly use meaningful variable and label names. It will ease the review process and maintenance. Done. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1551: > >> 1549: __ movq(r15, rsi); >> 1550: __ movq(r11, rdi); >> 1551: __ cmpq(rsi, 0x20); > > Comparisons with 32 bit integer length can use cmpl instead of cmpq, this may save emitting REX encoding prefix if index is allocated a GPR from lower register bank (no need for setting REX.W). I fixed as many as I could find. Thanks. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1552: > >> 1550: __ movq(r11, rdi); >> 1551: __ cmpq(rsi, 0x20); >> 1552: __ jb(L_small_string); > > All the comparisons against needle length are signed integer comparisons, so jb should be replaced by jl I'm treating everything as unsigned except where intentional negative values are used. It never makes sense for needle length to be negative. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610118449 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610110754 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610105405 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610111320 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610113343 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610116033 From sgibbons at openjdk.org Wed May 22 14:53:27 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 14:53:27 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v7] In-Reply-To: References: Message-ID: On Mon, 22 Jan 2024 07:08:31 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 505: >> >>> 503: __ cmpb(Address(rbx, r15, Address::times_1, -0xa), rax); >>> 504: __ jne(L_top_loop_1); >>> 505: __ jmp(L_0x406019); >> >> Instead of having special handling for each tail size (3 - 31 bytes), can we directly use 32 bytes VMASKMOVPS with appropriate mask for different tail sizes and only residual part (0 - 3 bytes) can fall over to scalar tail. > > Basically tail size can be rounded to nearest multiple of doubleword. I have since changed the algorithm due to request from @sviswa7 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610120366 From sgibbons at openjdk.org Wed May 22 14:53:28 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 14:53:28 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v5] In-Reply-To: References: Message-ID: <8FGB4fvnPGhSSdLgY5POXyGajpA-b-Ir31ee1WrG660=.0afedbf4-b717-4d1a-a3f0-c36b5e02a4d8@github.com> On Mon, 8 Jan 2024 10:32:51 GMT, Jatin Bhateja wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressing review comments. > > src/hotspot/share/opto/library_call.cpp line 1273: > >> 1271: Node* result = nullptr; >> 1272: >> 1273: if ((StubRoutines::string_indexof() != nullptr) && (ae == StrIntrinsicNode::LL)) { > > Why are we not calling stub for StrIntrinsicNode::UU Stub being called for LL, UL, and UU now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610089409 From sgibbons at openjdk.org Wed May 22 14:53:30 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 14:53:30 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v6] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 15:14:41 GMT, Emanuel Peter wrote: >> Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: >> >> - Merge branch 'openjdk:master' into indexof >> - Addressing review comments. >> - Fix for JDK-8321599 >> - Support UU IndexOf >> - Only use optimization when EnableX86ECoreOpts is true >> - Fix whitespace >> - Merge branch 'openjdk:master' into indexof >> - Comments; added exhaustive-ish test >> - Subtracting 0x10 twice. >> - Stomped on r13 in switch branch calculation >> - ... and 11 more: https://git.openjdk.org/jdk/compare/8a4dc79e...600377b0 > > test/jdk/java/lang/StringBuffer/IndexOf.java line 34: > >> 32: public class IndexOf { >> 33: >> 34: static Random generator = new Random(1999); > > Would it be an alternative to use the this: > > import jdk.test.lib.Utils; > ... > Random random = Utils.getRandomInstance(); > > This has a random seed, but it is always printed in the output and can be set via a test-flag. Changed. > test/jdk/java/lang/StringBuffer/IndexOf.java line 44: > >> 42: } >> 43: System.out.println(""); >> 44: generator.setSeed(1999); > > Is there a good reason for a fixed seed? Nope :-). Needed consistency during testing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610087089 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610088114 From sgibbons at openjdk.org Wed May 22 14:53:31 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 14:53:31 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 19:18:27 GMT, Volodymyr Paprotski wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Rearrange; add lambdas for clarity > > test/jdk/java/lang/StringBuffer/IndexOf.java line 54: > >> 52: // for (int i = 1; i < 128; i++) { >> 53: // haystack_16[i] = (char) (i); >> 54: // } > > dead code Removed. > test/jdk/java/lang/StringBuffer/IndexOf.java line 83: > >> 81: shs = "$&),,18+-!'8)+"; >> 82: endNeedle = "8)-"; >> 83: l_offset = 9; > > dead code Fixed. > test/jdk/java/lang/StringBuffer/IndexOf.java line 237: > >> 235: + sourceBuffer.toString() + " len Buffer = " + sourceBuffer.toString().length()); >> 236: System.err.println(" naive = " + naiveFind(sourceBuffer.toString(), targetString, 0) + ", IndexOf = " >> 237: + sourceBuffer.indexOf(targetString)); > > More tracing left behind here and rest of this function (original just recorded failure and moved along) I think it's more valuable for a test to print out what it can when a failure occurs rather than just saying "failed". > test/jdk/java/lang/StringBuffer/IndexOf.java line 284: > >> 282: >> 283: // Note: it is possible although highly improbable that failCount will >> 284: // be > 0 even if everthing is working ok > > This sounds like either a bug or a testcase bug? Same as line 301, `extremely remote possibility of > 1 match`? This was there from the original author. I think they were trying to infer that a match could occur in the rare case that the same random string was produced. They're random after all, and there's no reason the same sequence could be generated. > test/micro/org/openjdk/bench/java/lang/StringIndexOfHuge.java line 81: > >> 79: lateMatchString16 = dataStringHuge16.substring(dataStringHuge16.length() - 31); >> 80: >> 81: searchString = "oscar"; > > Would had liked to see a few more small needles (i.e. to test/verify individual switch statement cases) I'm hoping we can incorporate your test to cover more cases :-). > test/micro/org/openjdk/bench/java/lang/StringIndexOfHuge.java line 132: > >> 130: @Benchmark >> 131: public int searchHugeLargeSubstring() { >> 132: return dataStringHuge.indexOf("B".repeat(30) + "X" + "A".repeat(30), 74); > > .repeat() call and string concatenation shouldn't be part of the benchmark (here and several other @Benchmark functions in this file) since it will detract from the measurement. > > (String concatenation gets converted (by javac) into StringBuilder().append().append()....append().toString()) Since we're only concerned with the delta of performance, does this really matter? Can you suggest an alternative? > test/micro/org/openjdk/bench/java/lang/StringIndexOfHuge.java line 242: > >> 240: @Benchmark >> 241: public int search16HugeLargeSubstring16() { >> 242: return dataStringHuge16.indexOf("B".repeat(30) + "X" + "A".repeat(30), 74); > > `search16HugeLargeSubstring16` implies UU, but with `"B".repeat(30) + "X" + "A".repeat(30)` is UL Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610131285 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610134566 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610138116 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610142104 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610130140 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610126743 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610128630 From sgibbons at openjdk.org Wed May 22 14:53:31 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 14:53:31 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v7] In-Reply-To: <3m2_CQE-NHOCN20Z4LbosqwihcUCVopTgycXADInLEI=.25f797e8-e620-4f10-9da0-245a890c41de@github.com> References: <3m2_CQE-NHOCN20Z4LbosqwihcUCVopTgycXADInLEI=.25f797e8-e620-4f10-9da0-245a890c41de@github.com> Message-ID: On Mon, 15 Jan 2024 13:30:42 GMT, Andrey Turbanov wrote: >> Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: >> >> - Merge branch 'openjdk:master' into indexof >> - Merge branch 'openjdk:master' into indexof >> - Addressing review comments. >> - Fix for JDK-8321599 >> - Support UU IndexOf >> - Only use optimization when EnableX86ECoreOpts is true >> - Fix whitespace >> - Merge branch 'openjdk:master' into indexof >> - Comments; added exhaustive-ish test >> - Subtracting 0x10 twice. >> - ... and 12 more: https://git.openjdk.org/jdk/compare/8e12053e...3e58d0c2 > > test/jdk/java/lang/StringBuffer/IndexOf.java line 220: > >> 218: >> 219: for (int x = 0; x < 1000000; x++) { >> 220: if(make_new) { > > Suggestion: > > if (make_new) { Fixed. > test/jdk/java/lang/StringBuffer/IndexOf.java line 262: > >> 260: } >> 261: >> 262: if(make_new) > > Suggestion: > > if (make_new) Fixed. > test/jdk/java/lang/StringBuffer/IndexOf.java line 295: > >> 293: } >> 294: >> 295: if(make_new) testIndex = getRandomIndex(-100, 100); > > Suggestion: > > if (make_new) testIndex = getRandomIndex(-100, 100); Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610093771 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610094790 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610097958 From dfenacci at openjdk.org Wed May 22 14:58:07 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 22 May 2024 14:58:07 GMT Subject: RFR: 8325520: Vector loads and stores with indices and masks incorrectly compiled [v12] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 14:09:21 GMT, Damon Fenacci wrote: >> Looks good now! Thanks for all the updates, I think now the fix looks really concise ? >> >>> we add a specific store_Opcode method to LoadVectorGatherNode, LoadVectorMaskedNode and LoadVectorGatherMaskedNode that doesn?t return a store opcode but instead returns its own (to avoid ever being the same as a store node). >> >> This part in the PR description could be updated: now we return `-1` for those that we think are not "comparable". > >> This part in the PR description could be updated: now we return `-1` for those that we think are not "comparable". > > You're right. Fixed. > Thanks a lot for the review @eme64!! > @dafedafe I recommend that you update the Bug title. It names "offsets", but really you probably want to name "indices and mask", or just "masked and gather/scatter vector operations". And it also is not just about "loads", but also "stores", correct? Yep. I changed the title... and adapted the description as well (offsets -> indices). Thanks @eme64 ------------- PR Comment: https://git.openjdk.org/jdk/pull/18347#issuecomment-2125010649 From sgibbons at openjdk.org Wed May 22 15:05:11 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 15:05:11 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: <8Y-nIHc8vfB1X_hp3tpqqqgpCzu6dAt6BBIP_zc4Q70=.c9a48c68-8c14-4af9-8357-ab50e62a5fd3@github.com> Message-ID: On Fri, 17 May 2024 22:37:13 GMT, Sandhya Viswanathan wrote: >> Not sure what you mean here. I *think* you mean that hsLength is not the length of the remaining bytes in the haystack, but the actual length. There may be an issue if that is correct, right? I'll investigate. > > Yes, that is what I meant. Thanks for investigating. I've moved the code checking for (n-k)<32 to `big_case_loop_helper`, so there's no need for this in here any longer. Removing unneeded parameters from `compare_big_haystack_to_needle`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610166656 From rcastanedalo at openjdk.org Wed May 22 15:11:06 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 22 May 2024 15:11:06 GMT Subject: RFR: 8332527: ZGC: generalize object cloning logic [v2] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 09:17:16 GMT, Roberto Casta?eda Lozano wrote: >> This changeset generalize the logic to produce a runtime call to clone a class instance so that it can be shared by other collectors adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). The changeset moves the logic from `ZBarrierSetC2` to the GC-shared `BarrierSetC2` class and adds support for 32-bits platforms. >> >> #### Testing >> >> - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> - tier4-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only). >> - `compiler/arraycopy` tests (linux-x86-debug) with [an additional patch](https://github.com/openjdk/jdk/commit/ddcf777894e740b8e6ddbbf8821e82a173c23ef4) that implements cloning of large class instances with a runtime clone call rather than arraycopy when using G1 (to exercise the generalized logic on a 32-bits platform). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Applied Axel's suggestions Thanks Tobias and Axel, as far as I understand the special handling of the case `ac->is_clone_array() && ary_ptr != nullptr` was added by [JDK-8270098](https://bugs.openjdk.java.net/browse/JDK-8270098) (in JDK 18 b10) to deal with reflective `clone()` invocations as in this example: public static Object testCloneObject(Method clone, Object obj) throws Exception { return clone.invoke(obj); } String[] array = new String[N]; (...) Method clone = Object.class.getDeclaredMethod("clone"); clone.setAccessible(true); ... = testCloneObject(clone, array); The code above led, in JDK 18 b10, to an ArrayCopy with `CloneArray` kind but source node of type `java/lang/Object:NotNull *`, which motivated the introduction of special handling for this case. Short after (JDK 18 b22), [JEP 416](https://bugs.openjdk.org/browse/JDK-8271820) was integrated, changing significantly the implementation of this reflective invocation. After JEP 416, the above example and the tests introduced by JDK-8270098 do not require special handling anymore, because the reflective invocation results in a native call to [jdk.internal.reflect.DirectMethodHandleAccessor$NativeAccessor::invoke0](https://github.com/openjdk/jdk/blob/9ca90ccd6bfec76e54e2e870bd706fad5abf233c/src/java.base/share/classes/jdk/internal/reflect/DirectMethodHandleAccessor.java#L268) and is not C2-compiled anymore. I have also run our CI tiers 1-8 asserting that `ary_ptr != nullptr` if `ac->is_clone_array()` holds, without encountering any failure. So there might be an opportunity to simplify the ` clone_at_expansion` logic by reverting at least part of the special handling introduced by JDK-8270098. However, I think this is out of scope for this RFE, and should be addressed separately. I propose to remove in this RFE the assumption that `BarrierSetC2::clone_instance_in_runtime` can only be called for instance cloning and limit the RFE to simply moving logic from ZBarrierSetC2 into BarrierSetC2 and adding support for 32-bits platforms, without any other changes to the existing logic. After integrating this RFE, I will then create a RFE for investigating the feasibility of the JDK-8270098 clean-up. @TobiHartmann @xmas92 do you agree? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19311#issuecomment-2125041718 From yzheng at openjdk.org Wed May 22 15:24:05 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 22 May 2024 15:24:05 GMT Subject: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v6] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 14:47:43 GMT, Yudi Zheng wrote: >> Moving array construction within BigInteger.implMultiplyToLen intrinsic candidate to its caller simplifies the intrinsic implementation in JIT compiler. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > address comments. @theRealAph @TheRealMDoerr @RealFYang @offamitkumar could you please help review the aarch64/ppc/riscv/s390 changes? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18226#issuecomment-2125073511 From dfenacci at openjdk.org Wed May 22 15:58:09 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 22 May 2024 15:58:09 GMT Subject: RFR: 8325520: Vector loads and stores with indices and masks incorrectly compiled [v11] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 06:48:12 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectornode.hpp line 979: >> >>> 977: idx == MemNode::ValueIn || >>> 978: idx == MemNode::ValueIn + 1; } >>> 979: virtual Node* offsets() const { return in(Offsets); } >> >> Would be nice to add some `override` keywords here ;) > > And also below. On macosx we build with the flag `-Winconsistent-missing-override` enabled, which seems to allow either no `override` anywhere or in all places where a method actually overrides another one. So, in this case it seems that we would either need to add `override`s to `virtual int Opcode()`, `virtual uint match_edge(uint idx)`, `virtual Node* Ideal(PhaseGVN* phase, bool can_reshape)` etc. in all classes, or remove the 4 `override` keywords from the `indices` and `mask` methods. I might opt for the second choice. @eme64 what do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1610251737 From duke at openjdk.org Wed May 22 16:22:18 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Wed, 22 May 2024 16:22:18 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v28] In-Reply-To: References: Message-ID: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: add comment about is_map1 prefix function parameter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/5b6fdce0..47a0cd70 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=26-27 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From duke at openjdk.org Wed May 22 16:22:19 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Wed, 22 May 2024 16:22:19 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v27] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 00:27:42 GMT, Vladimir Kozlov wrote: >> Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: >> >> add comment to ::stmxcsr and ::ldmxcsr > > src/hotspot/cpu/x86/assembler_x86.cpp line 13030: > >> 13028: } >> 13029: } >> 13030: if (is_map1) emit_int8(0x0F); > > - First. What `is_map1` means? There is no explanation for this name. May be add comment somewhere in `assembler_x86.hpp` file or use more meaningful name. > > - Second. You added one more byte `0x0F` for instructions even when extended registers are not used and APX is not enabled. Why? You added it in several `prefix()` and `prefixq()` methods. It can lead to regression since code size will increase. The is_map1 bool indicates an x86 map1 instruction which, when legacy encoded, uses a 0x0F opcode prefix. By specification, the opcode prefix is omitted when using rex2 encoding in support of APX extended GPRs. I added this comment before the relevant prefix functions in assembler_x86.hpp. Is this sufficient? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1610286432 From sgibbons at openjdk.org Wed May 22 16:25:24 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 16:25:24 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v22] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Added comments; move n-k<32 code up a level ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/38868a35..f4ca4a5e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=20-21 Stats: 214 lines in 4 files changed: 100 ins; 72 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sgibbons at openjdk.org Wed May 22 16:36:41 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 16:36:41 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v23] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Adding exhaustive test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/f4ca4a5e..b6d77fe0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=21-22 Stats: 249 lines in 1 file changed: 249 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sgibbons at openjdk.org Wed May 22 16:39:11 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 16:39:11 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v22] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 16:25:24 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Added comments; move n-k<32 code up a level By her request ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2125255793 From dnsimon at openjdk.org Wed May 22 16:50:24 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 22 May 2024 16:50:24 GMT Subject: RFR: 8332735: [JVMCI] Add extra JVMCI events for exception translation Message-ID: This PR adds a few extra JVMCI events that should help diagnose crashes when translating an exception between the HotSpot and libgraal heaps. ------------- Commit messages: - add JVMCI event for decoding step during exception translation Changes: https://git.openjdk.org/jdk/pull/19350/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19350&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332735 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19350.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19350/head:pull/19350 PR: https://git.openjdk.org/jdk/pull/19350 From sgibbons at openjdk.org Wed May 22 16:54:26 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 16:54:26 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v24] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Added header file ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/b6d77fe0..f002fd54 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=22-23 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From kvn at openjdk.org Wed May 22 17:04:09 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 22 May 2024 17:04:09 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v28] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 16:22:18 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > add comment about is_map1 prefix function parameter Looks good. Please wait result of @eme64 testing before integration. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18476#pullrequestreview-2071852352 From kvn at openjdk.org Wed May 22 17:04:10 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 22 May 2024 17:04:10 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v27] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 16:19:04 GMT, Steve Dohrmann wrote: >> src/hotspot/cpu/x86/assembler_x86.cpp line 13030: >> >>> 13028: } >>> 13029: } >>> 13030: if (is_map1) emit_int8(0x0F); >> >> - First. What `is_map1` means? There is no explanation for this name. May be add comment somewhere in `assembler_x86.hpp` file or use more meaningful name. >> >> - Second. You added one more byte `0x0F` for instructions even when extended registers are not used and APX is not enabled. Why? You added it in several `prefix()` and `prefixq()` methods. It can lead to regression since code size will increase. > > The is_map1 bool indicates an x86 map1 instruction which, when > legacy encoded, uses a 0x0F opcode prefix. By specification, the > opcode prefix is omitted when using rex2 encoding in support > of APX extended GPRs. > > I added this comment before the relevant prefix functions in assembler_x86.hpp. Is this sufficient? Comment is good. Actually you should have point me to code which shows that code generation did not change, we generated `0x0F` before: - prefix(src); - emit_int16(0x0F, 0x18); + prefix(src, true /* is_map1 */); + emit_int8(0x18); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1610346227 From sgibbons at openjdk.org Wed May 22 17:40:24 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 17:40:24 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v25] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: un-helper-ize preload_needle_helper; try fix for macos build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/f002fd54..b0ef5e6f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=23-24 Stats: 102 lines in 1 file changed: 5 ins; 91 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From mli at openjdk.org Wed May 22 18:03:19 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 22 May 2024 18:03:19 GMT Subject: RFR: 8320999: RISC-V: C2 RotateLeftV [v2] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > More detailed description is inline in the code. > Thanks Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - merge - Fix imm6 in vror.vi; misc - Merge branch 'master' into rotate-left-right-v - add comments - fix mask - fix imm & long - fixes - Merge branch 'master' into rotate-left-right-v - fixes - remove redundant code: UseZvbb - ... and 2 more: https://git.openjdk.org/jdk/compare/a0c5714d...edd0201d ------------- Changes: https://git.openjdk.org/jdk/pull/19325/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19325&range=01 Stats: 218 lines in 4 files changed: 210 ins; 3 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/19325.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19325/head:pull/19325 PR: https://git.openjdk.org/jdk/pull/19325 From mli at openjdk.org Wed May 22 18:07:02 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 22 May 2024 18:07:02 GMT Subject: RFR: 8320999: RISC-V: C2 RotateLeftV In-Reply-To: <72VYJDMwIFJBvkzmEQIaSNPuIfoPYmL5lyCmqiHIoJw=.666ecb52-9157-4534-8b55-4cf43133a5a0@github.com> References: <72VYJDMwIFJBvkzmEQIaSNPuIfoPYmL5lyCmqiHIoJw=.666ecb52-9157-4534-8b55-4cf43133a5a0@github.com> Message-ID: <2HhbgHTvSTmK4tP-VIZ7hm9ZeaGjBNdlrwCkpzXWKDM=.d1263431-7020-46c1-b6f2-33da420862c2@github.com> On Wed, 22 May 2024 08:59:59 GMT, Hamlin Li wrote: > I'll need to refine the patch a bit, seems imm in vror.vi is 6 bits rather than 5 bits which is the case in basic vector instructions. I have modified it to use vror.vi with 6 bits imm. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19325#issuecomment-2125442504 From mli at openjdk.org Wed May 22 18:07:05 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 22 May 2024 18:07:05 GMT Subject: RFR: 8320999: RISC-V: C2 RotateLeftV [v2] In-Reply-To: <7sd_3bkzxUEXjHNVpHhnjVLRDu1J_VlTPfC5ZQPjxAM=.d53dc43c-6e28-4b6b-8628-3ee050780885@github.com> References: <7sd_3bkzxUEXjHNVpHhnjVLRDu1J_VlTPfC5ZQPjxAM=.d53dc43c-6e28-4b6b-8628-3ee050780885@github.com> Message-ID: On Wed, 22 May 2024 14:15:39 GMT, Ludovic Henry wrote: >> Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: >> >> - merge >> - Fix imm6 in vror.vi; misc >> - Merge branch 'master' into rotate-left-right-v >> - add comments >> - fix mask >> - fix imm & long >> - fixes >> - Merge branch 'master' into rotate-left-right-v >> - fixes >> - remove redundant code: UseZvbb >> - ... and 2 more: https://git.openjdk.org/jdk/compare/a0c5714d...edd0201d > > src/hotspot/cpu/riscv/assembler_riscv.hpp line 1887: > >> 1885: >> 1886: // Vector Bit-manipulation used in Cryptography (Zvbb) Extension >> 1887: INSN(vrol_vx, 0b1010111, 0b100, 0b010101); > > we are not using `vrol_vx` anywhere. thanks for catching, removed. > src/hotspot/cpu/riscv/assembler_riscv.hpp line 1899: > >> 1897: >> 1898: // Vector Bit-manipulation used in Cryptography (Zvbb) Extension >> 1899: INSN(vror_vi, 0b1010111, 0b011, 0b010100); > > I'm assuming there is not `vrol_vi`? It would be worth leaving a small comment here like > > // There is no `vrol_vi` instruction. yes, it makes sense to do so. > src/hotspot/cpu/riscv/matcher_riscv.hpp line 132: > >> 130: // Does the CPU supports vector variable shift instructions? >> 131: static constexpr bool supports_vector_variable_shifts(void) { >> 132: return true; > > What's the path to checking for `UseZvbb` and `UseZvbc` respectively to the specific instruction? Here, the 3 checks are not independent, they depend on other checks in both vectorization and vector API, so returning true is fine. But in order to eliminate everyone?s doubts in the future, I change them return UseZvbb. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19325#discussion_r1610429116 PR Review Comment: https://git.openjdk.org/jdk/pull/19325#discussion_r1610428809 PR Review Comment: https://git.openjdk.org/jdk/pull/19325#discussion_r1610430351 From never at openjdk.org Wed May 22 18:13:01 2024 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 22 May 2024 18:13:01 GMT Subject: RFR: 8332735: [JVMCI] Add extra JVMCI events for exception translation In-Reply-To: References: Message-ID: On Wed, 22 May 2024 16:44:46 GMT, Doug Simon wrote: > This PR adds a few extra JVMCI events that should help diagnose crashes when translating an exception between the HotSpot and libgraal heaps. Looks good ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19350#pullrequestreview-2071997436 From mseledtsov at openjdk.org Wed May 22 18:22:11 2024 From: mseledtsov at openjdk.org (Mikhailo Seledtsov) Date: Wed, 22 May 2024 18:22:11 GMT Subject: RFR: 8332739: Problemlist compiler/codecache/CheckLargePages until JDK-8332654 is fixed Message-ID: Please review this trivial problem listing change. ------------- Commit messages: - 8332739: problemlist compiler/codecache/CheckLargePages until JDK-8332654 is fixed Changes: https://git.openjdk.org/jdk/pull/19351/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19351&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332739 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19351.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19351/head:pull/19351 PR: https://git.openjdk.org/jdk/pull/19351 From sgibbons at openjdk.org Wed May 22 18:44:21 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 18:44:21 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v26] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 58 commits: - Merge branch 'openjdk:master' into indexof - un-helper-ize preload_needle_helper; try fix for macos build - Added header file - Adding exhaustive test - Added comments; move n-k<32 code up a level - Fixed CI compiles; re-factor UL processing - Addressing lots of comments. Interim commit. - Rearrange; add lambdas for clarity - Merge remote-tracking branch 'origin/master' into indexof - Move arrays_equals back to c2_MacroAssembler - ... and 48 more: https://git.openjdk.org/jdk/compare/37c47785...f4eefe1a ------------- Changes: https://git.openjdk.org/jdk/pull/16753/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=25 Stats: 4303 lines in 16 files changed: 4140 ins; 26 del; 137 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sgibbons at openjdk.org Wed May 22 18:52:27 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 18:52:27 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v27] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Revert last change to IndexOf.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/f4eefe1a..ed4451d1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=25-26 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From kvn at openjdk.org Wed May 22 19:02:02 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 22 May 2024 19:02:02 GMT Subject: RFR: 8332739: Problemlist compiler/codecache/CheckLargePages until JDK-8332654 is fixed In-Reply-To: References: Message-ID: On Wed, 22 May 2024 18:00:58 GMT, Mikhailo Seledtsov wrote: > Please review this trivial problem listing change. Good and trivial. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19351#pullrequestreview-2072127735 From dcubed at openjdk.org Wed May 22 19:37:01 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 22 May 2024 19:37:01 GMT Subject: RFR: 8332739: Problemlist compiler/codecache/CheckLargePages until JDK-8332654 is fixed In-Reply-To: References: Message-ID: On Wed, 22 May 2024 18:00:58 GMT, Mikhailo Seledtsov wrote: > Please review this trivial problem listing change. Thumbs up. I agree this is a trivial fix. ------------- Marked as reviewed by dcubed (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19351#pullrequestreview-2072193937 From mseledtsov at openjdk.org Wed May 22 20:08:06 2024 From: mseledtsov at openjdk.org (Mikhailo Seledtsov) Date: Wed, 22 May 2024 20:08:06 GMT Subject: RFR: 8332739: Problemlist compiler/codecache/CheckLargePages until JDK-8332654 is fixed In-Reply-To: References: Message-ID: On Wed, 22 May 2024 18:00:58 GMT, Mikhailo Seledtsov wrote: > Please review this trivial problem listing change. Thank you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19351#issuecomment-2125654753 From mseledtsov at openjdk.org Wed May 22 20:08:06 2024 From: mseledtsov at openjdk.org (Mikhailo Seledtsov) Date: Wed, 22 May 2024 20:08:06 GMT Subject: Integrated: 8332739: Problemlist compiler/codecache/CheckLargePages until JDK-8332654 is fixed In-Reply-To: References: Message-ID: On Wed, 22 May 2024 18:00:58 GMT, Mikhailo Seledtsov wrote: > Please review this trivial problem listing change. This pull request has now been integrated. Changeset: 3d4185a9 Author: Mikhailo Seledtsov URL: https://git.openjdk.org/jdk/commit/3d4185a9ce482cc655a4c67f39cb2682b02ae4fe Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8332739: Problemlist compiler/codecache/CheckLargePages until JDK-8332654 is fixed Reviewed-by: kvn, dcubed ------------- PR: https://git.openjdk.org/jdk/pull/19351 From dlong at openjdk.org Wed May 22 20:09:07 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 22 May 2024 20:09:07 GMT Subject: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v6] In-Reply-To: <1nU6OzVHKjN_v9tJD4vTnoQa6hTn5CgDF15PQsyr5YE=.ed74dc2b-33f8-4828-a730-43f03a9aa4ab@github.com> References: <1nU6OzVHKjN_v9tJD4vTnoQa6hTn5CgDF15PQsyr5YE=.ed74dc2b-33f8-4828-a730-43f03a9aa4ab@github.com> Message-ID: On Wed, 22 May 2024 14:28:41 GMT, Yudi Zheng wrote: >> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4693: >> >>> 4691: const Register xlen = r1; >>> 4692: const Register z = r2; >>> 4693: const Register zlen = r3; >> >> LibraryCallKit::inline_squareToLen() is still computing zlen and passing it as the 4th arg, even though the value is unused. > > ppc x86 are not using `multiply_to_len` for `generate_squareToLen`. I think we still need to pass zlen for these platforms. OK. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1610580527 From eastig at amazon.co.uk Wed May 22 21:16:07 2024 From: eastig at amazon.co.uk (Astigeevich, Evgeny) Date: Wed, 22 May 2024 21:16:07 +0000 Subject: Preventing bugs with register usage and ABI compliance in hand-written assembly In-Reply-To: <13E4349A-31B0-4F8B-98ED-ABCB5360C673@amazon.co.uk> References: <13E4349A-31B0-4F8B-98ED-ABCB5360C673@amazon.co.uk> Message-ID: <98D267BB-FFE8-42BD-A48B-CE5CA0E9C3E8@amazon.co.uk> Hello, I?d like to discuss what options we have to prevent bugs like https://bugs.openjdk.org/browse/JDK-8324874 ?AArch64: crypto pmull based CRC32/CRC32C intrinsics clobber V8-V15 registers?. I am thinking about something like: ? { ABIVerifier abi_verifier(masm); ? Some assembly code } // When we leave the block abi_verifier will check ABI compliance ? An alternative way might to enable ABI verification via StubCodeMark. Kind regrads, Evgeny Astigeevich Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sgibbons at openjdk.org Wed May 22 21:45:50 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 21:45:50 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v28] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Remove DO_EARLY_BAILOUT ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/ed4451d1..027daf73 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=26-27 Stats: 19 lines in 1 file changed: 0 ins; 19 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From mdoerr at openjdk.org Wed May 22 21:46:12 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 22 May 2024 21:46:12 GMT Subject: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v6] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 14:47:43 GMT, Yudi Zheng wrote: >> Moving array construction within BigInteger.implMultiplyToLen intrinsic candidate to its caller simplifies the intrinsic implementation in JIT compiler. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > address comments. PPC64 part and shared code looks correct. "java/math/BigInteger" tests have passed on PPC64. I didn't review the other platforms. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18226#pullrequestreview-2072434441 From sgibbons at openjdk.org Thu May 23 01:29:45 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 23 May 2024 01:29:45 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v29] In-Reply-To: References: Message-ID: <3Qow6_N97mxWzdMj2zmgj9MHmDWuIG4LYm_Lj4arxcg=.c8dba6ef-26bf-48e5-9a70-b010dcc8940b@github.com> > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Check macos build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/027daf73..42af0b50 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=27-28 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sgibbons at openjdk.org Thu May 23 02:03:26 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 23 May 2024 02:03:26 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v30] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Check macos build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/42af0b50..40a1e628 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=28-29 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From epeter at openjdk.org Thu May 23 05:56:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 23 May 2024 05:56:15 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v18] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 14:54:17 GMT, Roland Westrelin wrote: >> This change implements C2 optimizations for calls to >> ScopedValue.get(). Indeed, in: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> `v2` can be replaced by `v1` and the second call to `get()` can be >> optimized out. That's true whatever is between the 2 calls unless a >> new mapping for `scopedValue` is created in between (when that happens >> no optimizations is performed for the method being compiled). Hoisting >> a `get()` call out of loop for a loop invariant `scopedValue` should >> also be legal in most cases. >> >> `ScopedValue.get()` is implemented in java code as a 2 step process. A >> cache is attached to the current thread object. If the `ScopedValue` >> object is in the cache then the result from `get()` is read from >> there. Otherwise a slow call is performed that also inserts the >> mapping in the cache. The cache itself is lazily allocated. One >> `ScopedValue` can be hashed to 2 different indexes in the cache. On a >> cache probe, both indexes are checked. As a consequence, the process >> of probing the cache is a multi step process (check if the cache is >> present, check first index, check second index if first index >> failed). If the cache is populated early on, then when the method that >> calls `ScopedValue.get()` is compiled, profile reports the slow path >> as never taken and only the read from the cache is compiled. >> >> To perform the optimizations, I added 3 new node types to C2: >> >> - the pair >> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for >> the cache probe >> >> - a cfg node ScopedValueGetResultNode to help locate the result of the >> `get()` call in the IR graph. >> >> In pseudo code, once the nodes are inserted, the code of a `get()` is: >> >> >> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) >> if (hits_in_the_cache) { >> res = ScopedValueGetLoadFromCache(hits_in_the_cache); >> } else { >> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex >> } >> res = ScopedValueGetResult(res) >> >> >> In the snippet: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> Replacing `v2` by `v1` is then done by starting from the >> `ScopedValueGetResult` node for the second `get()` and looking for a >> dominating `ScopedValueGetResult` for the same `ScopedValue` >> object. When one is found, it is used as a replacement. Eliminating >> the second `get()` call is achieved by making >> `ScopedValueGetHitsInCache` always successful if there's a dominating >> `Scoped... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > whitespaces I know this is "parked" now, but there are some internal conversations happening. One question that @dean-long and @rose00 had: If there is more than one ScopedValue, what about profile pollution? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2126290108 From amitkumar at openjdk.org Thu May 23 07:24:10 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 23 May 2024 07:24:10 GMT Subject: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v6] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 14:47:43 GMT, Yudi Zheng wrote: >> Moving array construction within BigInteger.implMultiplyToLen intrinsic candidate to its caller simplifies the intrinsic implementation in JIT compiler. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > address comments. I have looked into s390x code and do not see any issue there. Also run test for BigInteger separately and a round of tier1 test as well. Result is clean there as well. ------------- Marked as reviewed by amitkumar (Committer). PR Review: https://git.openjdk.org/jdk/pull/18226#pullrequestreview-2073070871 From thartmann at openjdk.org Thu May 23 07:50:06 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 23 May 2024 07:50:06 GMT Subject: RFR: 8332527: ZGC: generalize object cloning logic [v2] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 09:17:16 GMT, Roberto Casta?eda Lozano wrote: >> This changeset generalize the logic to produce a runtime call to clone a class instance so that it can be shared by other collectors adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). The changeset moves the logic from `ZBarrierSetC2` to the GC-shared `BarrierSetC2` class and adds support for 32-bits platforms. >> >> #### Testing >> >> - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> - tier4-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only). >> - `compiler/arraycopy` tests (linux-x86-debug) with [an additional patch](https://github.com/openjdk/jdk/commit/ddcf777894e740b8e6ddbbf8821e82a173c23ef4) that implements cloning of large class instances with a runtime clone call rather than arraycopy when using G1 (to exercise the generalized logic on a 32-bits platform). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Applied Axel's suggestions Thanks for digging up the history, Roberto. Now I remember working on [JDK-8270098](https://bugs.openjdk.java.net/browse/JDK-8270098). As we discussed offline, we still need that fix because even though these bytecodes are no longer generated by the reflection API (and javac), they might still be generated "manually". We should add some corresponding .jasm tests. I agree with your suggestion to limit the scope of this PR and create a follow-up RFE/bug. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19311#issuecomment-2126451613 From dnsimon at openjdk.org Thu May 23 08:14:06 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 23 May 2024 08:14:06 GMT Subject: RFR: 8332735: [JVMCI] Add extra JVMCI events for exception translation In-Reply-To: References: Message-ID: On Wed, 22 May 2024 16:44:46 GMT, Doug Simon wrote: > This PR adds a few extra JVMCI events that should help diagnose crashes when translating an exception between the HotSpot and libgraal heaps. Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19350#issuecomment-2126495167 From dnsimon at openjdk.org Thu May 23 08:14:07 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 23 May 2024 08:14:07 GMT Subject: Integrated: 8332735: [JVMCI] Add extra JVMCI events for exception translation In-Reply-To: References: Message-ID: On Wed, 22 May 2024 16:44:46 GMT, Doug Simon wrote: > This PR adds a few extra JVMCI events that should help diagnose crashes when translating an exception between the HotSpot and libgraal heaps. This pull request has now been integrated. Changeset: 612ae928 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/612ae9289a130b8701f74253fe5499358a2e2b5b Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8332735: [JVMCI] Add extra JVMCI events for exception translation Reviewed-by: never ------------- PR: https://git.openjdk.org/jdk/pull/19350 From dfenacci at openjdk.org Thu May 23 08:48:38 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 23 May 2024 08:48:38 GMT Subject: RFR: 8326615: C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v3] In-Reply-To: References: Message-ID: > # Issue > > The test `compiler/startup/StartupOutput.java` fails intermittently due to a crash after correctly printing the error `Initial size of CodeCache is too small` (the test limits the code cache using k-XX:InitialCodeCacheSize=1024K -XX:ReservedCodeCacheSize=1200k`). > The appearance of the issue is very dependent on thread scheduling. The original report happens during C1 initialization but C2 initialization is affected as well. > > # Causes > > There is one occurrence during C1 initialization and one during C2 initialization where a call to `RuntimeStub::new_runtime_stub` can fail fatally if there is not enough space left. > For C1: `Compiler::init_c1_runtime` -> `Runtime1::initialize` -> `Runtime1::generate_blob_for` -> `Runtime1::generate_blob` -> `RuntimeStub::new_runtime_stub`. > For C2: `C2Compiler::initialize` -> `OptoRuntime::generate` -> `OptoRuntime::generate_stub` -> `Compile::Compile` -> `Compile::Code_Gen` -> `PhaseOutput::install` -> `PhaseOutput::install_stub` -> `RuntimeStub::new_runtime_stub`. > > # Solution > > https://github.com/openjdk/jdk/pull/15970 introduced an optional argument to `RuntimeStub::new_runtime_stub` to determine if it fails fatally or not. We can take advantage of it to avoid crashing and instead pass the information about the success or failure of the allocation up the (C1 and C2 initialization) call stack up to where we can set the compilations as failed. Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8326615: handle allocation failures in barrier set ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19280/files - new: https://git.openjdk.org/jdk/pull/19280/files/c505aac5..bd2a7adf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19280&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19280&range=01-02 Stats: 27 lines in 10 files changed: 13 ins; 1 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/19280.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19280/head:pull/19280 PR: https://git.openjdk.org/jdk/pull/19280 From thartmann at openjdk.org Thu May 23 08:52:07 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 23 May 2024 08:52:07 GMT Subject: RFR: 8325520: Vector loads and stores with indices and masks incorrectly compiled [v11] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 15:55:30 GMT, Damon Fenacci wrote: >> And also below. > > On macosx we build with the flag `-Winconsistent-missing-override` enabled, which seems to allow either no `override` anywhere or in all places where a method actually overrides another one. So, in this case it seems that we would either need to add `override`s to `virtual int Opcode()`, `virtual uint match_edge(uint idx)`, `virtual Node* Ideal(PhaseGVN* phase, bool can_reshape)` etc. in all classes, or remove the 4 `override` keywords from the `indices` and `mask` methods. I might opt for the second choice. @eme64 what do you think? The second choice makes sense to limit the scope of the changes in this PR. If we still want to add the `override` keywords, we should do that in a separate RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1611280596 From dfenacci at openjdk.org Thu May 23 08:57:14 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 23 May 2024 08:57:14 GMT Subject: RFR: 8326615: C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v4] In-Reply-To: References: Message-ID: > # Issue > > The test `compiler/startup/StartupOutput.java` fails intermittently due to a crash after correctly printing the error `Initial size of CodeCache is too small` (the test limits the code cache using k-XX:InitialCodeCacheSize=1024K -XX:ReservedCodeCacheSize=1200k`). > The appearance of the issue is very dependent on thread scheduling. The original report happens during C1 initialization but C2 initialization is affected as well. > > # Causes > > There is one occurrence during C1 initialization and one during C2 initialization where a call to `RuntimeStub::new_runtime_stub` can fail fatally if there is not enough space left. > For C1: `Compiler::init_c1_runtime` -> `Runtime1::initialize` -> `Runtime1::generate_blob_for` -> `Runtime1::generate_blob` -> `RuntimeStub::new_runtime_stub`. > For C2: `C2Compiler::initialize` -> `OptoRuntime::generate` -> `OptoRuntime::generate_stub` -> `Compile::Compile` -> `Compile::Code_Gen` -> `PhaseOutput::install` -> `PhaseOutput::install_stub` -> `RuntimeStub::new_runtime_stub`. > > # Solution > > https://github.com/openjdk/jdk/pull/15970 introduced an optional argument to `RuntimeStub::new_runtime_stub` to determine if it fails fatally or not. We can take advantage of it to avoid crashing and instead pass the information about the success or failure of the allocation up the (C1 and C2 initialization) call stack up to where we can set the compilations as failed. Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8326615: update copyright year ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19280/files - new: https://git.openjdk.org/jdk/pull/19280/files/bd2a7adf..f16d9910 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19280&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19280&range=02-03 Stats: 9 lines in 9 files changed: 2 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/19280.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19280/head:pull/19280 PR: https://git.openjdk.org/jdk/pull/19280 From epeter at openjdk.org Thu May 23 08:59:08 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 23 May 2024 08:59:08 GMT Subject: RFR: 8325520: Vector loads and stores with indices and masks incorrectly compiled [v11] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 08:49:43 GMT, Tobias Hartmann wrote: >> On macosx we build with the flag `-Winconsistent-missing-override` enabled, which seems to allow either no `override` anywhere or in all places where a method actually overrides another one. So, in this case it seems that we would either need to add `override`s to `virtual int Opcode()`, `virtual uint match_edge(uint idx)`, `virtual Node* Ideal(PhaseGVN* phase, bool can_reshape)` etc. in all classes, or remove the 4 `override` keywords from the `indices` and `mask` methods. I might opt for the second choice. @eme64 what do you think? > > The second choice makes sense to limit the scope of the changes in this PR. If we still want to add the `override` keywords, we should do that in a separate RFE. Ah yes. I forgot about `-Winconsistent-missing-override`. No `override` keyword then ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1611290337 From dfenacci at openjdk.org Thu May 23 09:00:03 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 23 May 2024 09:00:03 GMT Subject: RFR: 8326615: C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v2] In-Reply-To: References: <6r39P_htGVom5FkgRtjngxwur1_uVG13JS33mBeghPk=.8ce1ed82-cda9-4dae-beb6-18c60024ec18@github.com> Message-ID: <_pDUuJRfxlVxWMwy-ANetqFMUXUE4k7ZnJtk-boFXaM=.81359a12-4077-4625-9e60-8cc276435560@github.com> On Tue, 21 May 2024 21:33:28 GMT, Dean Long wrote: >> Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update src/hotspot/share/c1/c1_Runtime1.cpp >> >> Co-authored-by: Tobias Hartmann >> - Update src/hotspot/share/c1/c1_Compiler.cpp >> >> Co-authored-by: Tobias Hartmann > > src/hotspot/share/c1/c1_Runtime1.cpp line 287: > >> 285: #endif >> 286: BarrierSetC1* bs = BarrierSet::barrier_set()->barrier_set_c1(); >> 287: bs->generate_c1_runtime_stubs(blob); > > Don't we need to handle failures in generate_c1_runtime_stubs? With the assert removed, I think we'll get a nullptr crash. Yep, you're right. Actually that call could potentially fail too. I've added code to handle that case as well. Thanks @dean-long! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19280#discussion_r1611292612 From dfenacci at openjdk.org Thu May 23 09:21:22 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 23 May 2024 09:21:22 GMT Subject: RFR: 8325520: Vector loads and stores with indices and masks incorrectly compiled [v15] In-Reply-To: References: Message-ID: <9sityqynkqN0fRXBlmIOcx6tlUfD8K7h25b99Tb3p1E=.10e1b727-74ec-4085-bf44-e28d4713d1d1@github.com> > # Issue > When loading multiple vectors using indices or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, indices, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. > > # Causes > On vector-capable platforms, vector loads with masks and indices (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or indices are mapped as `LoadVector` nodes instead. > The same is true for `StoreVector`s. > When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 > > where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor indices interfere. > Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > but we don?t make sure that there are no masks or indices. > A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and indices here too but in this case we can include these cases if the masks and indices of the vector stores are equivalent. > > # Solution > To avoid folding `Load`- and `StoreVector`s with masks and indices we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns -1. In this way, the checks in `MemNode::can_see_stored_value` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 > > and `StoreNode::Identity` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > will fail if masks or indices are used. > For 2 stores of the same value we instead check for mask and indices equality. > > Regression tests for all versions of `Load/StoreVectorGather/Masked` have been added too. Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8325520: remove override keywords ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18347/files - new: https://git.openjdk.org/jdk/pull/18347/files/50a75ffe..0c015fa1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=13-14 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18347.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18347/head:pull/18347 PR: https://git.openjdk.org/jdk/pull/18347 From dfenacci at openjdk.org Thu May 23 09:25:08 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 23 May 2024 09:25:08 GMT Subject: RFR: 8325520: Vector loads and stores with indices and masks incorrectly compiled [v11] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 08:56:18 GMT, Emanuel Peter wrote: >> The second choice makes sense to limit the scope of the changes in this PR. If we still want to add the `override` keywords, we should do that in a separate RFE. > > Ah yes. I forgot about `-Winconsistent-missing-override`. No `override` keyword then ? C'est la vie ? Removed. Thanks guys. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1611335258 From thartmann at openjdk.org Thu May 23 09:32:05 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 23 May 2024 09:32:05 GMT Subject: RFR: 8325520: Vector loads and stores with indices and masks incorrectly compiled [v15] In-Reply-To: <9sityqynkqN0fRXBlmIOcx6tlUfD8K7h25b99Tb3p1E=.10e1b727-74ec-4085-bf44-e28d4713d1d1@github.com> References: <9sityqynkqN0fRXBlmIOcx6tlUfD8K7h25b99Tb3p1E=.10e1b727-74ec-4085-bf44-e28d4713d1d1@github.com> Message-ID: On Thu, 23 May 2024 09:21:22 GMT, Damon Fenacci wrote: >> # Issue >> When loading multiple vectors using indices or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, indices, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. >> >> # Causes >> On vector-capable platforms, vector loads with masks and indices (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or indices are mapped as `LoadVector` nodes instead. >> The same is true for `StoreVector`s. >> When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 >> >> where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor indices interfere. >> Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 >> >> but we don?t make sure that there are no masks or indices. >> A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and indices here too but in this case we can include these cases if the masks and indices of the vector stores are equivalent. >> >> # Solution >> To avoid folding `Load`- and `StoreVector`s with masks and indices we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns -1. In this way, the checks in `MemNode::can_see_stored_value` >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 >> >> and `StoreNode::Identity` >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 >> >> will fail if masks or indices are used. >> For 2 stores of the same value we instead check for mask and indices equality. >> >> Regression tests for all versions of `Load/StoreVectorGather/Masked` ha... > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8325520: remove override keywords This is good to go. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18347#pullrequestreview-2073411412 From epeter at openjdk.org Thu May 23 09:38:06 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 23 May 2024 09:38:06 GMT Subject: RFR: 8325520: Vector loads and stores with indices and masks incorrectly compiled [v15] In-Reply-To: <9sityqynkqN0fRXBlmIOcx6tlUfD8K7h25b99Tb3p1E=.10e1b727-74ec-4085-bf44-e28d4713d1d1@github.com> References: <9sityqynkqN0fRXBlmIOcx6tlUfD8K7h25b99Tb3p1E=.10e1b727-74ec-4085-bf44-e28d4713d1d1@github.com> Message-ID: On Thu, 23 May 2024 09:21:22 GMT, Damon Fenacci wrote: >> # Issue >> When loading multiple vectors using indices or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, indices, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. >> >> # Causes >> On vector-capable platforms, vector loads with masks and indices (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or indices are mapped as `LoadVector` nodes instead. >> The same is true for `StoreVector`s. >> When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 >> >> where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor indices interfere. >> Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 >> >> but we don?t make sure that there are no masks or indices. >> A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and indices here too but in this case we can include these cases if the masks and indices of the vector stores are equivalent. >> >> # Solution >> To avoid folding `Load`- and `StoreVector`s with masks and indices we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns -1. In this way, the checks in `MemNode::can_see_stored_value` >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 >> >> and `StoreNode::Identity` >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 >> >> will fail if masks or indices are used. >> For 2 stores of the same value we instead check for mask and indices equality. >> >> Regression tests for all versions of `Load/StoreVectorGather/Masked` ha... > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8325520: remove override keywords Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18347#pullrequestreview-2073425024 From bkilambi at openjdk.org Thu May 23 10:15:08 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 23 May 2024 10:15:08 GMT Subject: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v6] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 14:47:43 GMT, Yudi Zheng wrote: >> Moving array construction within BigInteger.implMultiplyToLen intrinsic candidate to its caller simplifies the intrinsic implementation in JIT compiler. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > address comments. src/hotspot/share/opto/library_call.cpp line 5925: > 5923: // Set the original stack and the reexecute bit for the interpreter to reexecute > 5924: // the bytecode that invokes BigInteger.multiplyToLen() if deoptimization happens > 5925: // on the return from z array allocation in runtime. Since we are not allocating z array during runtime anymore, do we still need these comments? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1611403873 From bkilambi at openjdk.org Thu May 23 10:30:08 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 23 May 2024 10:30:08 GMT Subject: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v6] In-Reply-To: References: Message-ID: <2K7bXCyATxLA0IoTcuyURmOJ7dlY1kH1-tvVadK6F6c=.db3668e6-4bb3-46ac-98aa-a8866c007708@github.com> On Wed, 22 May 2024 14:47:43 GMT, Yudi Zheng wrote: >> Moving array construction within BigInteger.implMultiplyToLen intrinsic candidate to its caller simplifies the intrinsic implementation in JIT compiler. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > address comments. src/java.base/share/classes/java/math/BigInteger.java line 1836: > 1834: > 1835: if (z == null || z.length < (xlen + ylen)) > 1836: z = new int[xlen + ylen]; Style: only 4 spaces indentation ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1611422191 From luhenry at openjdk.org Thu May 23 10:57:05 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 23 May 2024 10:57:05 GMT Subject: RFR: 8320999: RISC-V: C2 RotateLeftV [v2] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 18:03:19 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> More detailed description is inline in the code. >> Thanks > > Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - merge > - Fix imm6 in vror.vi; misc > - Merge branch 'master' into rotate-left-right-v > - add comments > - fix mask > - fix imm & long > - fixes > - Merge branch 'master' into rotate-left-right-v > - fixes > - remove redundant code: UseZvbb > - ... and 2 more: https://git.openjdk.org/jdk/compare/a0c5714d...edd0201d Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19325#pullrequestreview-2073602956 From sgibbons at openjdk.org Thu May 23 14:09:36 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 23 May 2024 14:09:36 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v31] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Check macos build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/40a1e628..87b1ebe8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=30 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=29-30 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From bkilambi at openjdk.org Thu May 23 14:54:16 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 23 May 2024 14:54:16 GMT Subject: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v6] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 14:47:43 GMT, Yudi Zheng wrote: >> Moving array construction within BigInteger.implMultiplyToLen intrinsic candidate to its caller simplifies the intrinsic implementation in JIT compiler. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > address comments. Tested tier1 on aarch64 and no failures. Also no regressions (or even gain) on aarch64 with the BigInteger testcase you mentioned. I think copyright year has not been updated for some of the files but I guess that's up to you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18226#issuecomment-2127335211 From roland at openjdk.org Thu May 23 15:02:22 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 23 May 2024 15:02:22 GMT Subject: RFR: 8332829: [BACKOUT] C2: crash in compiled code because of dependency on removed range check CastIIs Message-ID: I'm backing out that fix because it has caused several issues. Backout is clean. ------------- Commit messages: - Revert "8324517: C2: crash in compiled code because of dependency on removed range check CastIIs" Changes: https://git.openjdk.org/jdk/pull/19369/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19369&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332829 Stats: 562 lines in 6 files changed: 23 ins; 536 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19369.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19369/head:pull/19369 PR: https://git.openjdk.org/jdk/pull/19369 From thartmann at openjdk.org Thu May 23 15:25:02 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 23 May 2024 15:25:02 GMT Subject: RFR: 8332829: [BACKOUT] C2: crash in compiled code because of dependency on removed range check CastIIs In-Reply-To: References: Message-ID: On Thu, 23 May 2024 14:57:58 GMT, Roland Westrelin wrote: > I'm backing out that fix because it has caused several issues. Backout is clean. Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19369#pullrequestreview-2074319029 From duke at openjdk.org Thu May 23 16:26:11 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Thu, 23 May 2024 16:26:11 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v28] In-Reply-To: <6Ob1gJGLun-R2zHKWUvE6BJj1J01JYdC3VYn1Dt9bTw=.2bd102b8-7c18-4083-9be9-60bcce43c8a9@github.com> References: <6Ob1gJGLun-R2zHKWUvE6BJj1J01JYdC3VYn1Dt9bTw=.2bd102b8-7c18-4083-9be9-60bcce43c8a9@github.com> Message-ID: On Tue, 21 May 2024 06:12:16 GMT, Emanuel Peter wrote: >> @steveatgh I think it could make sense to add a simple "hello world" JTREG test that enables the `UseAPX` flag, just to test if it is handled correctly, even on platforms that do not have the feature enabled. > >> Thank you @eme64 for the comments. The functionality of the UseAPX flag is, as you point out, incomplete in this pull request. A subsequent PR (see JDK-8329030) will tie the logic of the flag in with a query of the hardware features. It was added in this PR thinking it could be useful for testing or debugging the encoding functionality. > > Wait. Does this mean that if I enable the `UseAPX` flag on my `AVX512` machine with `UseAVX=3`, that we will start encoding instructions using APX? Can that lead to wrong results? Hi @eme64, just wanted to check on the status of testing. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2127572193 From roland at openjdk.org Thu May 23 16:39:14 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 23 May 2024 16:39:14 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v18] In-Reply-To: References: Message-ID: <575TsEWD5s5I2t291jDmU0o_ij6VD_a1qZRmTLixrJg=.02b58ebb-0904-432d-a770-c4a4ee2612ea@github.com> On Thu, 2 May 2024 14:54:17 GMT, Roland Westrelin wrote: >> This change implements C2 optimizations for calls to >> ScopedValue.get(). Indeed, in: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> `v2` can be replaced by `v1` and the second call to `get()` can be >> optimized out. That's true whatever is between the 2 calls unless a >> new mapping for `scopedValue` is created in between (when that happens >> no optimizations is performed for the method being compiled). Hoisting >> a `get()` call out of loop for a loop invariant `scopedValue` should >> also be legal in most cases. >> >> `ScopedValue.get()` is implemented in java code as a 2 step process. A >> cache is attached to the current thread object. If the `ScopedValue` >> object is in the cache then the result from `get()` is read from >> there. Otherwise a slow call is performed that also inserts the >> mapping in the cache. The cache itself is lazily allocated. One >> `ScopedValue` can be hashed to 2 different indexes in the cache. On a >> cache probe, both indexes are checked. As a consequence, the process >> of probing the cache is a multi step process (check if the cache is >> present, check first index, check second index if first index >> failed). If the cache is populated early on, then when the method that >> calls `ScopedValue.get()` is compiled, profile reports the slow path >> as never taken and only the read from the cache is compiled. >> >> To perform the optimizations, I added 3 new node types to C2: >> >> - the pair >> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for >> the cache probe >> >> - a cfg node ScopedValueGetResultNode to help locate the result of the >> `get()` call in the IR graph. >> >> In pseudo code, once the nodes are inserted, the code of a `get()` is: >> >> >> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) >> if (hits_in_the_cache) { >> res = ScopedValueGetLoadFromCache(hits_in_the_cache); >> } else { >> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex >> } >> res = ScopedValueGetResult(res) >> >> >> In the snippet: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> Replacing `v2` by `v1` is then done by starting from the >> `ScopedValueGetResult` node for the second `get()` and looking for a >> dominating `ScopedValueGetResult` for the same `ScopedValue` >> object. When one is found, it is used as a replacement. Eliminating >> the second `get()` call is achieved by making >> `ScopedValueGetHitsInCache` always successful if there's a dominating >> `Scoped... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > whitespaces There are 3 optimizations that this patch performs: 1- replace a `ScopedValue.get()` by a dominating `ScopedValue.get()` 2- move a `ScopedValue.get()` out of loop 3- streamline code emitted for `ScopedValue.get()` There are 2 `ScopedValue.get()` patterns that are handled: 1- when profile reports that the slow path that updates the cache is not taken: `ScopedValue.get()` always hits in the cache 2- when profile reports that the slow path is taken and the code that updates the cache is included in the compiled code Obviously, before `ScopedValue.get()` can use the cache, the cache has to be updated. So the slow path is taken at some point. But because, hotspot doesn't profile the first invocations of a method, profile data can report the slow path as never taken. That's likely what happens with a simple micro benchmark. Also there can be multiple `ScopedValue` and profile can still report the slow path as not taken. It's a more a matter of when the cache is updated than how many `ScopedValue` there are. The patch performs optimization 1-, 2- and 3- for patterns 1- and 2- but, it does it better for pattern 1- than 2-. If the slow path is included in compiled code, then only a `ScopedValue.get` call that dominate the backedge of a loop is hoisted out of loop. That's because hoisting is a 2 step process in the case of pattern 2-: peel one iteration of the loop and replace the `ScopedValue.get` in the loop with the one from the peeled iteration. When the slow path is compiled in the method, a lot of extra code is also included and that likely disrupts other optimizations that might be needed before `ScopedValue.get` can be optimized. For instance, the slow path likely comes with a non inlined call that could get in the way of memory subgraph optimizations. That's why I made sure the patch handles both patterns. I thought about always speculating initially that the slow path is not taken when compiling `ScopedValue.get` or trying to find some way to work around profile pollution. I also thought about having cleverer ways of optimizing pattern 2-. But the patch felt complicated enough and when @theRealAph experimented with it, he reported that it was doing ok the way it is. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2127601116 From roland at openjdk.org Thu May 23 16:40:09 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 23 May 2024 16:40:09 GMT Subject: Integrated: 8332829: [BACKOUT] C2: crash in compiled code because of dependency on removed range check CastIIs In-Reply-To: References: Message-ID: On Thu, 23 May 2024 14:57:58 GMT, Roland Westrelin wrote: > I'm backing out that fix because it has caused several issues. Backout is clean. This pull request has now been integrated. Changeset: c9a7b977 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/c9a7b9772d96d9a4825d9da2aacc277534282860 Stats: 562 lines in 6 files changed: 23 ins; 536 del; 3 mod 8332829: [BACKOUT] C2: crash in compiled code because of dependency on removed range check CastIIs Reviewed-by: thartmann ------------- PR: https://git.openjdk.org/jdk/pull/19369 From sgibbons at openjdk.org Thu May 23 17:04:26 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 23 May 2024 17:04:26 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v32] In-Reply-To: References: Message-ID: <79fqpujoxeB-9xiWMWM9tTYQRsOqS6vHP4poomY0DSU=.7d52f61f-cafc-4a62-b27e-7ec9e35103ef@github.com> > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Check macos build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/87b1ebe8..23d2c511 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=31 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=30-31 Stats: 109 lines in 1 file changed: 42 ins; 4 del; 63 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sgibbons at openjdk.org Thu May 23 17:25:34 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 23 May 2024 17:25:34 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v33] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Fix for IndexOf.java on mac ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/23d2c511..cba6ffbe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=32 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=31-32 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From djelinski at openjdk.org Thu May 23 19:12:12 2024 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Thu, 23 May 2024 19:12:12 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v33] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 17:25:34 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Fix for IndexOf.java on mac src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 268: > 266: __ cmpq(needle_len_p, 0); > 267: __ jg_b(L_nextCheck); > 268: __ xorq(rax, rax); out of curiosity, is there any advantage to using `xorq` instead of `xorl` here? https://stackoverflow.com/a/33668295/7707617 suggests that `xorl` might be better, but it's a bit dated now. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 449: > 447: __ cmpq(r13, NUMBER_OF_CASES - 1); > 448: __ ja(L_smallCaseDefault); > 449: __ mov64(r15, (int64_t)small_jump_table); would it make sense to use `lea` here? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 803: > 801: __ movq(index, needle_len); > 802: __ andq(index, 0xf); // nLen % 16 > 803: __ movq(offset, 0x10); `movl` or `movptr` would produce a shorter encoding src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1544: > 1542: } > 1543: > 1544: __ align(8); why `8` and not `OptoLoopAlignment` ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612178285 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612179069 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612180163 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612183311 From sgibbons at openjdk.org Thu May 23 19:49:11 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 23 May 2024 19:49:11 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v33] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 19:02:05 GMT, Daniel Jeli?ski wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix for IndexOf.java on mac > > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 268: > >> 266: __ cmpq(needle_len_p, 0); >> 267: __ jg_b(L_nextCheck); >> 268: __ xorq(rax, rax); > > out of curiosity, is there any advantage to using `xorq` instead of `xorl` here? > > https://stackoverflow.com/a/33668295/7707617 suggests that `xorl` might be better, but it's a bit dated now. Thanks for finding this. It was ignorance on my part as I thought the xorq would have logic to not emit the REX prefix if not necessary, but it doesn't. Fixed. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 449: > >> 447: __ cmpq(r13, NUMBER_OF_CASES - 1); >> 448: __ ja(L_smallCaseDefault); >> 449: __ mov64(r15, (int64_t)small_jump_table); > > would it make sense to use `lea` here? It may, but I believe the movq is shorter (although maybe not to r15). I'll do some experimentation. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 803: > >> 801: __ movq(index, needle_len); >> 802: __ andq(index, 0xf); // nLen % 16 >> 803: __ movq(offset, 0x10); > > `movl` or `movptr` would produce a shorter encoding I tried to be consistent with the whole {q,l} syntax throughout when referring to each symbolic register. I feel that changing this would ripple through the code. @sviswa7 what do you think? > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1544: > >> 1542: } >> 1543: >> 1544: __ align(8); > > why `8` and not `OptoLoopAlignment` ? Short answer - because I didn't know there was such a thing as `OptoLoopAlignment`. I'll change that throughout at the top of my loops. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612201503 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612207461 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612216483 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612218363 From sgibbons at openjdk.org Thu May 23 19:54:39 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 23 May 2024 19:54:39 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v34] In-Reply-To: References: Message-ID: <13CORNysYmupJ3F2_7ekNqob8pz_xNmTg8gyKIt5vgs=.572e9f52-62ea-44cd-bac4-ab99a09a7510@github.com> > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Addressing review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/cba6ffbe..2283f2bf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=33 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=32-33 Stats: 7 lines in 2 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From kvn at openjdk.org Thu May 23 22:11:12 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 23 May 2024 22:11:12 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v34] In-Reply-To: <13CORNysYmupJ3F2_7ekNqob8pz_xNmTg8gyKIt5vgs=.572e9f52-62ea-44cd-bac4-ab99a09a7510@github.com> References: <13CORNysYmupJ3F2_7ekNqob8pz_xNmTg8gyKIt5vgs=.572e9f52-62ea-44cd-bac4-ab99a09a7510@github.com> Message-ID: On Thu, 23 May 2024 19:54:39 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Addressing review comments Few suggestions src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4250: > 4248: generate_chacha_stubs(); > 4249: > 4250: if ((UseAVX == 2) && EnableX86ECoreOpts && VM_Version::supports_avx2()) { `#ifdef COMPILER2` around this code to exclude JVMCI only case. src/hotspot/cpu/x86/stubGenerator_x86_64.hpp line 582: > 580: > 581: #ifdef COMPILER2 > 582: void generate_string_indexof_stubs(address *fnptrs, StrIntrinsicNode::ArgEncoding ae); Is it possible to make `generate_string_indexof_stubs()` as local static method in `stubGenerator_x86_64_string.cpp` and pass `StubGenerator*` as argument? Then you don't to include "opto/intrinsicnode.hpp" here. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 2: > 1: /* > 2: * Copyright (c) 2023, Intel Corporation. All rights reserved. 2024 year src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 27: > 25: > 26: #include "precompiled.hpp" > 27: #ifdef COMPILER2 You can exclude this file completely from compilation without this `#ifdef` if you prefix the name with `c2_`. There is code in make files to exclude such files: [JvmFeatures.gmk#L38](https://github.com/openjdk/jdk/blob/master/make/hotspot/lib/JvmFeatures.gmk#L38) ------------- PR Review: https://git.openjdk.org/jdk/pull/16753#pullrequestreview-2075150606 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612352891 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612383969 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612365050 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612375730 From kvn at openjdk.org Thu May 23 22:11:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 23 May 2024 22:11:13 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v34] In-Reply-To: References: <13CORNysYmupJ3F2_7ekNqob8pz_xNmTg8gyKIt5vgs=.572e9f52-62ea-44cd-bac4-ab99a09a7510@github.com> Message-ID: On Thu, 23 May 2024 21:50:15 GMT, Vladimir Kozlov wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressing review comments > > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4250: > >> 4248: generate_chacha_stubs(); >> 4249: >> 4250: if ((UseAVX == 2) && EnableX86ECoreOpts && VM_Version::supports_avx2()) { > > `#ifdef COMPILER2` around this code to exclude JVMCI only case. You don't need to check `VM_Version::supports_avx2()` because we reset `UseAVX` if avx2 is not supported. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612361847 From kvn at openjdk.org Thu May 23 22:45:09 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 23 May 2024 22:45:09 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v28] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 16:22:18 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > add comment about is_map1 prefix function parameter I looked and the testing passed. No new failures. You also addressed his comments. I think you can push. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2128152525 From duke at openjdk.org Thu May 23 22:57:13 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Thu, 23 May 2024 22:57:13 GMT Subject: Integrated: 8328998: Encoding support for Intel APX extended general-purpose registers In-Reply-To: References: Message-ID: On Mon, 25 Mar 2024 19:01:17 GMT, Steve Dohrmann wrote: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. This pull request has now been integrated. Changeset: f8a3e4e4 Author: steveatgh Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/f8a3e4e428f7d3e62177bdf148fe25e22d3ee2bf Stats: 926 lines in 5 files changed: 471 ins; 38 del; 417 mod 8328998: Encoding support for Intel APX extended general-purpose registers Reviewed-by: kvn, sviswanathan, jbhateja ------------- PR: https://git.openjdk.org/jdk/pull/18476 From sgibbons at openjdk.org Thu May 23 23:00:10 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 23 May 2024 23:00:10 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v34] In-Reply-To: References: <13CORNysYmupJ3F2_7ekNqob8pz_xNmTg8gyKIt5vgs=.572e9f52-62ea-44cd-bac4-ab99a09a7510@github.com> Message-ID: <5L1PFeLmHP6Lfg1bKx_tRU-ESTFfpqUbP9vHVbiaqPo=.c3fa3b1b-5433-4a68-b639-ef82b4a388d1@github.com> On Thu, 23 May 2024 21:56:39 GMT, Vladimir Kozlov wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4250: >> >>> 4248: generate_chacha_stubs(); >>> 4249: >>> 4250: if ((UseAVX == 2) && EnableX86ECoreOpts && VM_Version::supports_avx2()) { >> >> `#ifdef COMPILER2` around this code to exclude JVMCI only case. > > You don't need to check `VM_Version::supports_avx2()` because we reset `UseAVX` if avx2 is not supported. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612396114 From sgibbons at openjdk.org Thu May 23 23:00:12 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 23 May 2024 23:00:12 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v34] In-Reply-To: References: <13CORNysYmupJ3F2_7ekNqob8pz_xNmTg8gyKIt5vgs=.572e9f52-62ea-44cd-bac4-ab99a09a7510@github.com> Message-ID: On Thu, 23 May 2024 22:06:38 GMT, Vladimir Kozlov wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressing review comments > > src/hotspot/cpu/x86/stubGenerator_x86_64.hpp line 582: > >> 580: >> 581: #ifdef COMPILER2 >> 582: void generate_string_indexof_stubs(address *fnptrs, StrIntrinsicNode::ArgEncoding ae); > > Is it possible to make `generate_string_indexof_stubs()` as local static method in `stubGenerator_x86_64_string.cpp` and pass `StubGenerator*` as argument? > Then you don't to include "opto/intrinsicnode.hpp" here. Done. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 2: > >> 1: /* >> 2: * Copyright (c) 2023, Intel Corporation. All rights reserved. > > 2024 year Fixed. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 27: > >> 25: >> 26: #include "precompiled.hpp" >> 27: #ifdef COMPILER2 > > You can exclude this file completely from compilation without this `#ifdef` if you prefix the name with `c2_`. > There is code in make files to exclude such files: [JvmFeatures.gmk#L38](https://github.com/openjdk/jdk/blob/master/make/hotspot/lib/JvmFeatures.gmk#L38) I will change the name and remove the #ifdef. Thanks for this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612401461 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612399243 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612400071 From kvn at openjdk.org Thu May 23 23:11:08 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 23 May 2024 23:11:08 GMT Subject: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v6] In-Reply-To: References: Message-ID: <9l85eJF5sWNRxwdjtcyKdGmDRm9Hp9ZRKhSJejy5-FM=.29881024-aa33-42a5-b85b-06d359acfc66@github.com> On Wed, 22 May 2024 14:47:43 GMT, Yudi Zheng wrote: >> Moving array construction within BigInteger.implMultiplyToLen intrinsic candidate to its caller simplifies the intrinsic implementation in JIT compiler. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > address comments. This is good. Please, merge latest mainline and rerun mach5 testing. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18226#pullrequestreview-2075281290 From sgibbons at openjdk.org Thu May 23 23:12:42 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 23 May 2024 23:12:42 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v35] In-Reply-To: References: Message-ID: <-vyOZzeMslZqgJpTsQnnOWi4abWiM8fNeWSVx5LEHm8=.d37011ee-102c-4874-aa26-d113949d25ea@github.com> > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Review comments - move stubGen*_string.cpp to c2_stubGen*_string.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/2283f2bf..c034d3f9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=34 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=33-34 Stats: 73 lines in 3 files changed: 6 ins; 59 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From dlong at openjdk.org Thu May 23 23:21:13 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 23 May 2024 23:21:13 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v18] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 14:54:17 GMT, Roland Westrelin wrote: >> This change implements C2 optimizations for calls to >> ScopedValue.get(). Indeed, in: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> `v2` can be replaced by `v1` and the second call to `get()` can be >> optimized out. That's true whatever is between the 2 calls unless a >> new mapping for `scopedValue` is created in between (when that happens >> no optimizations is performed for the method being compiled). Hoisting >> a `get()` call out of loop for a loop invariant `scopedValue` should >> also be legal in most cases. >> >> `ScopedValue.get()` is implemented in java code as a 2 step process. A >> cache is attached to the current thread object. If the `ScopedValue` >> object is in the cache then the result from `get()` is read from >> there. Otherwise a slow call is performed that also inserts the >> mapping in the cache. The cache itself is lazily allocated. One >> `ScopedValue` can be hashed to 2 different indexes in the cache. On a >> cache probe, both indexes are checked. As a consequence, the process >> of probing the cache is a multi step process (check if the cache is >> present, check first index, check second index if first index >> failed). If the cache is populated early on, then when the method that >> calls `ScopedValue.get()` is compiled, profile reports the slow path >> as never taken and only the read from the cache is compiled. >> >> To perform the optimizations, I added 3 new node types to C2: >> >> - the pair >> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for >> the cache probe >> >> - a cfg node ScopedValueGetResultNode to help locate the result of the >> `get()` call in the IR graph. >> >> In pseudo code, once the nodes are inserted, the code of a `get()` is: >> >> >> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) >> if (hits_in_the_cache) { >> res = ScopedValueGetLoadFromCache(hits_in_the_cache); >> } else { >> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex >> } >> res = ScopedValueGetResult(res) >> >> >> In the snippet: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> Replacing `v2` by `v1` is then done by starting from the >> `ScopedValueGetResult` node for the second `get()` and looking for a >> dominating `ScopedValueGetResult` for the same `ScopedValue` >> object. When one is found, it is used as a replacement. Eliminating >> the second `get()` call is achieved by making >> `ScopedValueGetHitsInCache` always successful if there's a dominating >> `Scoped... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > whitespaces What's a good benchmark to run to show the benefit of this change, or to show the effect of different cache sizes and/or Java implementation changes? I tried running micro:ScopedValue benchmarks with -Djava.lang.ScopedValue.cacheSize=2 and didn't see a difference. But the new compiler/scoped_value/TestScopedValue.java test fails in compiler.c2.irTests.TestScopedValue.testFastPath16 with the cache size set to 2. Given the right benchmark, there are some experiments I'd like to try, related to the ScopedValue Java implemenation: 1. use only a primary slot probe, no secondary 2. use a deterministic secondary probe (based on the hash), not random 3. fix put() so it will reuse an existing slot. Currently it blindly set both `victim` and `other` slots. It seems like it should check the `other` slot first and reuse it if already set. 4. separate cache bitmap from slow path bitmaps, which could be 64-bits with only 1 bit per SV, not 2. 5. Use a per-SV MethodHandle getter using MethodHandles.guardWithTest() to avoid profile pollution ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2128182510 From kvn at openjdk.org Thu May 23 23:42:09 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 23 May 2024 23:42:09 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v35] In-Reply-To: <-vyOZzeMslZqgJpTsQnnOWi4abWiM8fNeWSVx5LEHm8=.d37011ee-102c-4874-aa26-d113949d25ea@github.com> References: <-vyOZzeMslZqgJpTsQnnOWi4abWiM8fNeWSVx5LEHm8=.d37011ee-102c-4874-aa26-d113949d25ea@github.com> Message-ID: On Thu, 23 May 2024 23:12:42 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Review comments - move stubGen*_string.cpp to c2_stubGen*_string.cpp I submitted our testing for latest v34 version of changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2128207810 From kvn at openjdk.org Fri May 24 00:50:10 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 24 May 2024 00:50:10 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v35] In-Reply-To: <-vyOZzeMslZqgJpTsQnnOWi4abWiM8fNeWSVx5LEHm8=.d37011ee-102c-4874-aa26-d113949d25ea@github.com> References: <-vyOZzeMslZqgJpTsQnnOWi4abWiM8fNeWSVx5LEHm8=.d37011ee-102c-4874-aa26-d113949d25ea@github.com> Message-ID: On Thu, 23 May 2024 23:12:42 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Review comments - move stubGen*_string.cpp to c2_stubGen*_string.cpp test/jdk/java/lang/StringBuffer/IndexOf.java line 2: > 1: /* > 2: * Copyright (c) 2000, 2024 Oracle and/or its affiliates. All rights reserved. This copyright header validation failure. Missing comma `,` after 2024. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612519675 From gcao at openjdk.org Fri May 24 03:24:05 2024 From: gcao at openjdk.org (Gui Cao) Date: Fri, 24 May 2024 03:24:05 GMT Subject: RFR: 8332615: RISC-V: Support vector unsigned comparison instructions for machines with RVV In-Reply-To: References: Message-ID: On Wed, 22 May 2024 08:24:23 GMT, Fei Yang wrote: >> Hi, I noticed the following warning in the Opto JIT Code for the Vector API in the `test/jdk/jdk/incubator/vector/Int256VectorTests.java: UNSIGNED_LTInt256VectorTests` test: >> >> ** not supported: unsigned comparison op=comp/1 vlen=8 etype=int ismask=usestore >> ``` >> After this Patch, We supports vector unsigned comparison instructions, the test passes normally and generates the Opto JIT Code such as: >> >> 23e B46: # out( B48 B47 ) <- in( B25 B45 ) Loop( B46-B45 ) Freq: 955.829 >> 23e addw R24, R29, zr #@convI2L_reg_reg >> 242 slli R30, R24, (#2 & 0x3f) #@lShiftL_reg_imm >> 246 add R17, R8, R30 # ptr, #@addP_reg_reg >> 24a add R19, R9, R30 # ptr, #@addP_reg_reg >> 24e addi R30, R17, #16 # ptr, #@addP_reg_imm >> 252 addi R31, R19, #16 # ptr, #@addP_reg_imm >> 256 loadV V1, [R30] # vector (rvv) >> 25e loadV V2, [R31] # vector (rvv) >> 266 vmaskcmp V0, V1, V2, #19 >> 272 vmask_tolong R20, V0 >> 280 vstoremask V1, V0 # elem size is #4 byte[s] >> 28c lw R31, [R17, #16] # int, #@loadI >> 290 lw R11, [R19, #16] # int, #@loadI >> 294 andi R10, R20, #1 #@andL_reg_imm >> 298 bne R10, zr, B48 #@cmpL_reg_imm0_branch P=0.669978 C=14596.000000 >> >> ### Testing: >> qemu 8.1.50 with UseRVV: >> - [x] Run tier1-3 tests (release) >> - [x] Run test/jdk/jdk/incubator/vector (fastdebug) > > Looks good. Thanks! @RealFYang : Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19328#issuecomment-2128439169 From dlong at openjdk.org Fri May 24 05:16:11 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 24 May 2024 05:16:11 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v18] In-Reply-To: <575TsEWD5s5I2t291jDmU0o_ij6VD_a1qZRmTLixrJg=.02b58ebb-0904-432d-a770-c4a4ee2612ea@github.com> References: <575TsEWD5s5I2t291jDmU0o_ij6VD_a1qZRmTLixrJg=.02b58ebb-0904-432d-a770-c4a4ee2612ea@github.com> Message-ID: On Thu, 23 May 2024 16:36:11 GMT, Roland Westrelin wrote: > 1- replace a ScopedValue.get() by a dominating ScopedValue.get() > 2- move a ScopedValue.get() out of loop > 3- streamline code emitted for ScopedValue.get() I think it might make sense to split 1 and 2, which are independent of the details of get() and put(), from 3. Then we can consider if there are other optimizations we can do around opaque() get() and put(). For example, why can't we replace a get() with the value from a dominating put()? Why can't we eliminate both the put() and get() completely, as long as the value can't "escape" or we deoptimize? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2128537981 From djelinski at openjdk.org Fri May 24 06:34:10 2024 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Fri, 24 May 2024 06:34:10 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v33] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 19:26:10 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 268: >> >>> 266: __ cmpq(needle_len_p, 0); >>> 267: __ jg_b(L_nextCheck); >>> 268: __ xorq(rax, rax); >> >> out of curiosity, is there any advantage to using `xorq` instead of `xorl` here? >> >> https://stackoverflow.com/a/33668295/7707617 suggests that `xorl` might be better, but it's a bit dated now. > > Thanks for finding this. It was ignorance on my part as I thought the xorq would have logic to not emit the REX prefix if not necessary, but it doesn't. Fixed. Right, it seems to surprise people. There's a lot of preexisting uses of xorq / xorptr to zero a register. I think it would make sense to implement this logic in xorq. I can do this if others agree. >> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 449: >> >>> 447: __ cmpq(r13, NUMBER_OF_CASES - 1); >>> 448: __ ja(L_smallCaseDefault); >>> 449: __ mov64(r15, (int64_t)small_jump_table); >> >> would it make sense to use `lea` here? > > It may, but I believe the movq is shorter (although maybe not to r15). I'll do some experimentation. the RIP-relative lea should have a shorter encoding. I think something like `lea(r15, ExternalAddress(small_jump_table))` should produce it (untested) >> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 803: >> >>> 801: __ movq(index, needle_len); >>> 802: __ andq(index, 0xf); // nLen % 16 >>> 803: __ movq(offset, 0x10); >> >> `movl` or `movptr` would produce a shorter encoding > > I tried to be consistent with the whole {q,l} syntax throughout when referring to each symbolic register. I feel that changing this would ripple through the code. @sviswa7 what do you think? Right, that makes sense. I wonder if there's any reason why the logic to select the best mov variant is in movptr, and not in movq. Usually the `ptr` functions just select the `l` or `q` overload depending on the target system, `movptr` is an exception here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612907959 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612908115 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612908219 From fyang at openjdk.org Fri May 24 07:12:12 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 24 May 2024 07:12:12 GMT Subject: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v6] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 14:47:43 GMT, Yudi Zheng wrote: >> Moving array construction within BigInteger.implMultiplyToLen intrinsic candidate to its caller simplifies the intrinsic implementation in JIT compiler. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > address comments. Hi, RISC-V part of change seems fine. "java/math/BigInteger" test result is clean on linux-riscv64 platform. Thanks for the ping. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18226#pullrequestreview-2076019194 From gcao at openjdk.org Fri May 24 07:15:06 2024 From: gcao at openjdk.org (Gui Cao) Date: Fri, 24 May 2024 07:15:06 GMT Subject: Integrated: 8332615: RISC-V: Support vector unsigned comparison instructions for machines with RVV In-Reply-To: References: Message-ID: On Tue, 21 May 2024 14:19:52 GMT, Gui Cao wrote: > Hi, I noticed the following warning in the Opto JIT Code for the Vector API in the `test/jdk/jdk/incubator/vector/Int256VectorTests.java: UNSIGNED_LTInt256VectorTests` test: > > ** not supported: unsigned comparison op=comp/1 vlen=8 etype=int ismask=usestore > ``` > After this Patch, We supports vector unsigned comparison instructions, the test passes normally and generates the Opto JIT Code such as: > > 23e B46: # out( B48 B47 ) <- in( B25 B45 ) Loop( B46-B45 ) Freq: 955.829 > 23e addw R24, R29, zr #@convI2L_reg_reg > 242 slli R30, R24, (#2 & 0x3f) #@lShiftL_reg_imm > 246 add R17, R8, R30 # ptr, #@addP_reg_reg > 24a add R19, R9, R30 # ptr, #@addP_reg_reg > 24e addi R30, R17, #16 # ptr, #@addP_reg_imm > 252 addi R31, R19, #16 # ptr, #@addP_reg_imm > 256 loadV V1, [R30] # vector (rvv) > 25e loadV V2, [R31] # vector (rvv) > 266 vmaskcmp V0, V1, V2, #19 > 272 vmask_tolong R20, V0 > 280 vstoremask V1, V0 # elem size is #4 byte[s] > 28c lw R31, [R17, #16] # int, #@loadI > 290 lw R11, [R19, #16] # int, #@loadI > 294 andi R10, R20, #1 #@andL_reg_imm > 298 bne R10, zr, B48 #@cmpL_reg_imm0_branch P=0.669978 C=14596.000000 > > ### Testing: > qemu 8.1.50 with UseRVV: > - [x] Run tier1-3 tests (release) > - [x] Run test/jdk/jdk/incubator/vector (fastdebug) This pull request has now been integrated. Changeset: 9b61a760 Author: Gui Cao Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/9b61a7608efff13fc3685488f3f54a810ec0ac22 Stats: 6 lines in 2 files changed: 4 ins; 0 del; 2 mod 8332615: RISC-V: Support vector unsigned comparison instructions for machines with RVV Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/19328 From fyang at openjdk.org Fri May 24 07:44:08 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 24 May 2024 07:44:08 GMT Subject: RFR: 8320999: RISC-V: C2 RotateLeftV [v2] In-Reply-To: References: Message-ID: <-xClnQXcYdLw5tq_Kq4PtOSNEO13lOrR_vD-nIMCzGU=.66662885-a465-4f23-92df-8de5cc656309@github.com> On Wed, 22 May 2024 18:03:19 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> More detailed description is inline in the code. >> Thanks > > Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - merge > - Fix imm6 in vror.vi; misc > - Merge branch 'master' into rotate-left-right-v > - add comments > - fix mask > - fix imm & long > - fixes > - Merge branch 'master' into rotate-left-right-v > - fixes > - remove redundant code: UseZvbb > - ... and 2 more: https://git.openjdk.org/jdk/compare/a0c5714d...edd0201d Hi, I have two comments after a cursory look. Thanks. src/hotspot/cpu/riscv/riscv_v.ad line 3097: > 3095: %} > 3096: > 3097: instruct vrotate_right_imm(vReg dst, vReg src, immI shift) %{ Question: Could we make use of the vector-scalar rotate variants (vrol_vx / vror_vx) in case `shift` is not a constant? src/hotspot/cpu/riscv/riscv_v.ad line 3101: > 3099: Matcher::vector_element_basic_type(n) == T_SHORT || > 3100: Matcher::vector_element_basic_type(n) == T_INT || > 3101: Matcher::vector_element_basic_type(n) == T_LONG); I am not sure but do we really need this predicate? ------------- PR Review: https://git.openjdk.org/jdk/pull/19325#pullrequestreview-2076099453 PR Review Comment: https://git.openjdk.org/jdk/pull/19325#discussion_r1613007020 PR Review Comment: https://git.openjdk.org/jdk/pull/19325#discussion_r1613007940 From asotona at openjdk.org Fri May 24 07:44:21 2024 From: asotona at openjdk.org (Adam Sotona) Date: Fri, 24 May 2024 07:44:21 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v9] In-Reply-To: References: Message-ID: > Hi, > During performance optimization work on Class-File API as JDK lambda generator we found some static initialization killers. > One of them is `java.lang.classfile.Attributes` with tens of static fields initialized with individual attribute mappers, and common set of all mappers, and static map from attribute names to the mappers. > > I propose to turn all the static fields into lazy-initialized static methods and remove `PREDEFINED_ATTRIBUTES` and `standardAttribute(Utf8Entry name)` static mapping method from the `Attributes` API class. > > Please let me know your comments or objections and please review the [PR](https://github.com/openjdk/jdk/pull/19006) and [CSR](https://bugs.openjdk.org/browse/JDK-8331414), so we can make it into 23. > > Thank you, > Adam Adam Sotona has updated the pull request incrementally with two additional commits since the last revision: - addressed CSR review comments - fixed CompilationIDMapper does not allow multiple instances ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19006/files - new: https://git.openjdk.org/jdk/pull/19006/files/b4203cfd..21515ec2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19006&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19006&range=07-08 Stats: 62 lines in 2 files changed: 61 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19006.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19006/head:pull/19006 PR: https://git.openjdk.org/jdk/pull/19006 From asotona at openjdk.org Fri May 24 08:24:15 2024 From: asotona at openjdk.org (Adam Sotona) Date: Fri, 24 May 2024 08:24:15 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v10] In-Reply-To: References: Message-ID: > Hi, > During performance optimization work on Class-File API as JDK lambda generator we found some static initialization killers. > One of them is `java.lang.classfile.Attributes` with tens of static fields initialized with individual attribute mappers, and common set of all mappers, and static map from attribute names to the mappers. > > I propose to turn all the static fields into lazy-initialized static methods and remove `PREDEFINED_ATTRIBUTES` and `standardAttribute(Utf8Entry name)` static mapping method from the `Attributes` API class. > > Please let me know your comments or objections and please review the [PR](https://github.com/openjdk/jdk/pull/19006) and [CSR](https://bugs.openjdk.org/browse/JDK-8331414), so we can make it into 23. > > Thank you, > Adam Adam Sotona has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: - fixed jdeps.Dependencies - Merge branch 'master' into JDK-8331291-attributes - addressed CSR review comments - fixed CompilationIDMapper does not allow multiple instances - fixed tests - fixed tests - fixed tests - updated LimitsTest - Merge branch 'master' into JDK-8331291-attributes # Conflicts: # test/jdk/jdk/classfile/SignaturesTest.java - Merge branch 'master' into JDK-8331291-attributes - ... and 6 more: https://git.openjdk.org/jdk/compare/239c1b33...37f7f63f ------------- Changes: https://git.openjdk.org/jdk/pull/19006/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19006&range=09 Stats: 2277 lines in 145 files changed: 960 ins; 613 del; 704 mod Patch: https://git.openjdk.org/jdk/pull/19006.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19006/head:pull/19006 PR: https://git.openjdk.org/jdk/pull/19006 From epeter at openjdk.org Fri May 24 08:29:04 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 24 May 2024 08:29:04 GMT Subject: RFR: 8331311: C2: Big Endian Port of 8318446: optimize stores into primitive arrays by combining values into larger store [v4] In-Reply-To: References: Message-ID: On Thu, 16 May 2024 01:39:26 GMT, Richard Reingruber wrote: >> This pr adds a few tweaks to [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446) which allows enabling it also on big endian platforms (e.g. AIX, S390). JDK-8318446 introduced a C2 optimization to replace consecutive stores to a primitive array with just one store. >> >> By example (from `TestMergeStores.java`): >> >> >> static Object[] test2a(byte[] a, int offset, long v) { >> if (IS_BIG_ENDIAN) { >> a[offset + 0] = (byte)(v >> 56); >> a[offset + 1] = (byte)(v >> 48); >> a[offset + 2] = (byte)(v >> 40); >> a[offset + 3] = (byte)(v >> 32); >> a[offset + 4] = (byte)(v >> 24); >> a[offset + 5] = (byte)(v >> 16); >> a[offset + 6] = (byte)(v >> 8); >> a[offset + 7] = (byte)(v >> 0); >> } else { >> a[offset + 0] = (byte)(v >> 0); >> a[offset + 1] = (byte)(v >> 8); >> a[offset + 2] = (byte)(v >> 16); >> a[offset + 3] = (byte)(v >> 24); >> a[offset + 4] = (byte)(v >> 32); >> a[offset + 5] = (byte)(v >> 40); >> a[offset + 6] = (byte)(v >> 48); >> a[offset + 7] = (byte)(v >> 56); >> } >> return new Object[]{ a }; >> } >> >> >> Depending on the endianess 8 bytes are stored into an array. The order of the stores is the same as the order of an 8-byte-store therefore 8 1-byte-stores can be replaced with just one 8-byte-store (if there aren't too many range checks). >> >> Additionally I've fixed a few comments and a test bug. >> >> The optimization seems to be a little bit more effective on big endian platforms. >> >> Again by example: >> >> >> static Object[] test800a(byte[] a, int offset, long v) { >> if (IS_BIG_ENDIAN) { >> a[offset + 0] = (byte)(v >> 40); // Removed from candidate list >> a[offset + 1] = (byte)(v >> 32); // Removed from candidate list >> a[offset + 2] = (byte)(v >> 24); // Merged >> a[offset + 3] = (byte)(v >> 16); // Merged >> a[offset + 4] = (byte)(v >> 8); // Merged >> a[offset + 5] = (byte)(v >> 0); // Merged >> } else { >> a[offset + 0] = (byte)(v >> 0); // Removed from candidate list >> a[offset + 1] = (byte)(v >> 8); // Removed from candidate list >> a[offset + 2] = (byte)(v >> 16); // Not merged >> a[offset + 3] = (byte)(v >> 24); // Not merged >> a[offset + 4] = (byte)(v >> 32); // Not merge... > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Eliminate IS_BIG_ENDIAN and always execute both variants I'm running testing again, but the code looks good now! I just had another idea: Could we use some sort of "byte reverse / shuffle" operation to do these use cases for both big/little-endian? storeBytes(bytes, offset, (byte)(value >> 8), (byte)(value >> 0)); storeBytes(bytes, offset, (byte)(value >> 0), (byte)(value >> 8)); Not sure if that would be profitable or even available on all platforms. Could be a future RFE someone can work on after this. What do you think? It might make performance more predictable across platforms. src/hotspot/share/opto/memnode.cpp line 3310: > 3308: Node* hi = first->in(MemNode::ValueIn); > 3309: Node* lo = _store->in(MemNode::ValueIn); > 3310: #endif // VM_LITTLE_ENDIAN A `swap` could be more concise. But I leave that up to you ;) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19218#pullrequestreview-2076196539 PR Review Comment: https://git.openjdk.org/jdk/pull/19218#discussion_r1613067944 From epeter at openjdk.org Fri May 24 08:29:05 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 24 May 2024 08:29:05 GMT Subject: RFR: 8331311: C2: Big Endian Port of 8318446: optimize stores into primitive arrays by combining values into larger store [v4] In-Reply-To: <7W-vxm7KC8qwd-GJAPh4TCtDhOzw7X5-gXanLudP27Y=.807f809f-92ce-498f-94c4-49b0405bbb6f@github.com> References: <7W-vxm7KC8qwd-GJAPh4TCtDhOzw7X5-gXanLudP27Y=.807f809f-92ce-498f-94c4-49b0405bbb6f@github.com> Message-ID: On Thu, 16 May 2024 05:29:58 GMT, Richard Reingruber wrote: >> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: >> >> Eliminate IS_BIG_ENDIAN and always execute both variants > > Test error is unrelated to the changes. Upload of test results failed: > `Error: Failed to CreateArtifact: Failed to make request after 5 attempts: Request timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact` @reinrich please ping me again to ask if testing is ok before you integrate ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19218#issuecomment-2128902274 From epeter at openjdk.org Fri May 24 08:53:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 24 May 2024 08:53:11 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v10] In-Reply-To: <26UiEE_uEKUU0lg_T91K-b4Or3mtGluJYybbJOpETOU=.a74004d6-590f-49e7-8880-4ab6627926dd@github.com> References: <26UiEE_uEKUU0lg_T91K-b4Or3mtGluJYybbJOpETOU=.a74004d6-590f-49e7-8880-4ab6627926dd@github.com> Message-ID: On Tue, 21 May 2024 13:11:22 GMT, Bhavana Kilambi wrote: >> Floating-point addition is non-associative, that is adding floating-point elements in arbitrary order may get different value. Specially, Vector API does not define the order of reduction intentionally, which allows platforms to generate more efficient codes [1]. So that needs a node to represent non strictly-ordered add-reduction for floating-point type in C2. >> >> To avoid introducing new nodes, this patch adds a bool field in `AddReductionVF/D` to distinguish whether they require strict order. It also removes `UnorderedReductionNode` and adds a virtual function `bool requires_strict_order()` in `ReductionNode`. Besides `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` have a fixed value. >> >> With this patch, Vector API would always generate non strictly-ordered `AddReductionVF/D' on SVE machines with vector length <= 16B as it is more beneficial to generate non-strictly ordered instructions on such machines compared to strictly ordered ones. >> >> [AArch64] >> On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. Auto-vectorization has already banned these nodes in JDK-8275275 [2]. >> >> This patch adds matching rules for non strictly-ordered `AddReductionVF/D`. >> >> No effects on other platforms. >> >> [Performance] >> FloatMaxVector.ADDLanes [3] measures the performance of add reduction for floating-point type. With this patch, it improves ~3x on my SVE machine (128-bit). >> >> ADDLanes >> >> Benchmark Before After Unit >> FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms >> >> >> Final code is as below: >> >> Before: >> ` fadda z17.s, p7/m, z17.s, z16.s >> ` >> After: >> >> faddp v17.4s, v21.4s, v21.4s >> faddp s18, v17.2s >> fadd s18, s18, s19 >> >> >> >> >> [Test] >> Full jtreg passed on AArch64 and x86. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2529 >> [2] https://bugs.openjdk.org/browse/JDK-8275275 >> [3] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L316 > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Modify JTREG IR rules and some style/format changes Thanks for adding the tests! ? I have a few more comments/requests. test/hotspot/jtreg/compiler/c2/irTests/TestVectorFPReduction.java line 24: > 22: */ > 23: > 24: package compiler.c2.irTests; Would you mind moving the tests away from `c2/irTests`, and into the `loopopts/superword` and `vectorapi` directories, please? `c2/irTests` is kind of a disorganized "catch-all". We only had this directory at the beginning of the IR-framework before it was widely adapted. Now it makes more sense to sort the tests by "what" they test, rather than "how" ;) test/hotspot/jtreg/compiler/c2/irTests/TestVectorFPReduction.java line 32: > 30: * @bug 8320725 > 31: * @summary Ensure strictly ordered AddReductionVF/VD nodes are generated on SVE machines > 32: * while being disabled on Neon Suggestion: * while being disabled on Neon test/hotspot/jtreg/compiler/c2/irTests/TestVectorFPReduction.java line 55: > 53: @IR(applyIfCPUFeatureAnd = {"asimd", "true", "sve", "false"}, failOn = {IRNode.ADD_REDUCTION_VF}) > 54: @IR(applyIfCPUFeature = {"sve", "true"}, counts = {"requires_strict_order", ">=1", IRNode.ADD_REDUCTION_VF, ">=1"}, > 55: failOn = {"no_strict_order"}, phase = CompilePhase.PRINT_IDEAL) Can you please re-format the rules? We usually have this order, each on a new line: counts failOn applyIf... phase Having a consistent format just makes it easier to read quickly ;) test/hotspot/jtreg/compiler/c2/irTests/TestVectorFPReduction.java line 67: > 65: @IR(applyIfCPUFeatureAnd = {"asimd", "true", "sve", "false"}, failOn = {IRNode.ADD_REDUCTION_VD}) > 66: @IR(applyIfCPUFeature = {"sve", "true"}, counts = {"requires_strict_order", ">=1", IRNode.ADD_REDUCTION_VD, ">=1"}, > 67: failOn = {"no_strict_order"}, phase = CompilePhase.PRINT_IDEAL) Also: I realize that you only check for `asimd / sve` features. Can you also apply it for avx features? test/hotspot/jtreg/compiler/vectorapi/TestVectorAddMulReduction.java line 42: > 40: * @bug 8320725 > 41: * @library /test/lib / > 42: * @requires os.arch == "aarch64" I think there is no reason to only run the test on aarch64. We can run the test anywhere, but the applyIf specifies on what platforms the IR rules are executed. test/hotspot/jtreg/compiler/vectorapi/TestVectorAddMulReduction.java line 44: > 42: * @requires os.arch == "aarch64" > 43: * @summary Verify non-strictly ordered AddReductionVF/VD and MulReductionVF/VD > 44: * nodes are generated for float and double types in VectorAPI indentation test/hotspot/jtreg/compiler/vectorapi/TestVectorAddMulReduction.java line 181: > 179: > 180: public static void main(String[] args) { > 181: TestFramework.runWithFlags("-XX:-TieredCompilation", Why `-XX:-TieredCompilation`? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18034#pullrequestreview-2076223588 PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1613095032 PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1613084279 PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1613097871 PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1613103237 PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1613104822 PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1613106458 PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1613107304 From epeter at openjdk.org Fri May 24 08:53:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 24 May 2024 08:53:11 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v10] In-Reply-To: References: <26UiEE_uEKUU0lg_T91K-b4Or3mtGluJYybbJOpETOU=.a74004d6-590f-49e7-8880-4ab6627926dd@github.com> Message-ID: <6M8hC17XmxLvDhhtGKgKxTAwfT8NV8_ameppOeyI9jQ=.f942d480-efbb-49db-9d7c-5ec93fb8f1c4@github.com> On Fri, 24 May 2024 08:36:51 GMT, Emanuel Peter wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Modify JTREG IR rules and some style/format changes > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorFPReduction.java line 24: > >> 22: */ >> 23: >> 24: package compiler.c2.irTests; > > Would you mind moving the tests away from `c2/irTests`, and into the `loopopts/superword` and `vectorapi` directories, please? `c2/irTests` is kind of a disorganized "catch-all". We only had this directory at the beginning of the IR-framework before it was widely adapted. Now it makes more sense to sort the tests by "what" they test, rather than "how" ;) Ah, I see one is already in the `vectorapi` directory, great! > test/hotspot/jtreg/compiler/c2/irTests/TestVectorFPReduction.java line 32: > >> 30: * @bug 8320725 >> 31: * @summary Ensure strictly ordered AddReductionVF/VD nodes are generated on SVE machines >> 32: * while being disabled on Neon > > Suggestion: > > * while being disabled on Neon nit: we usually indent the second line with the start of the summary-text of the first line. > test/hotspot/jtreg/compiler/vectorapi/TestVectorAddMulReduction.java line 42: > >> 40: * @bug 8320725 >> 41: * @library /test/lib / >> 42: * @requires os.arch == "aarch64" > > I think there is no reason to only run the test on aarch64. We can run the test anywhere, but the applyIf specifies on what platforms the IR rules are executed. So you can use the `asimd` or `avx...` features for that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1613112462 PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1613085012 PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1613106185 From epeter at openjdk.org Fri May 24 08:53:12 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 24 May 2024 08:53:12 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v10] In-Reply-To: <6M8hC17XmxLvDhhtGKgKxTAwfT8NV8_ameppOeyI9jQ=.f942d480-efbb-49db-9d7c-5ec93fb8f1c4@github.com> References: <26UiEE_uEKUU0lg_T91K-b4Or3mtGluJYybbJOpETOU=.a74004d6-590f-49e7-8880-4ab6627926dd@github.com> <6M8hC17XmxLvDhhtGKgKxTAwfT8NV8_ameppOeyI9jQ=.f942d480-efbb-49db-9d7c-5ec93fb8f1c4@github.com> Message-ID: On Fri, 24 May 2024 08:30:38 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/c2/irTests/TestVectorFPReduction.java line 32: >> >>> 30: * @bug 8320725 >>> 31: * @summary Ensure strictly ordered AddReductionVF/VD nodes are generated on SVE machines >>> 32: * while being disabled on Neon >> >> Suggestion: >> >> * while being disabled on Neon > > nit: we usually indent the second line with the start of the summary-text of the first line. I would also like a more general summary here, that is less ARM specific. Talk more about `requires_strict_order` and `no_strict_order`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1613102744 From epeter at openjdk.org Fri May 24 08:57:09 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 24 May 2024 08:57:09 GMT Subject: RFR: 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" [v2] In-Reply-To: References: <16J-lJ2AceGTVcRWBcP15yKcwO-1IA1XsngyOuNjf7k=.0776f081-ae2c-4279-87cf-d909806c2bc4@github.com> Message-ID: <9QZn_Rgc_Vk9x-c5w3MX-IIX7hHICjnsm_tFLvLtL4M=.b6177857-af0a-4070-8860-7e2d395f9ed7@github.com> On Thu, 7 Dec 2023 06:42:49 GMT, Fei Gao wrote: >> On LP64 systems, if the heap can be moved into low virtual address space (below 4GB) and the heap size is smaller than the interesting threshold of 4 GB, we can use unscaled decoding pattern for narrow klass decoding. It means that a generic field reference can be decoded by: >> >> cast<64> (32-bit compressed reference) + field_offset >> >> >> When the `field_offset` is an immediate, on aarch64 platform, the unscaled decoding pattern can match perfectly with a direct addressing mode, i.e., `base_plus_offset`, supported by `LDR/STR` instructions. But for certain data width, not all immediates can be encoded in the instruction field of `LDR/STR` [[1]](https://github.com/openjdk/jdk/blob/8db7bad992a0f31de9c7e00c2657c18670539102/src/hotspot/cpu/aarch64/assembler_aarch64.inline.hpp#L33). The ranges are different as data widths vary. >> >> For example, when we try to load a value of long type at offset of `1030`, the address expression is `(AddP (DecodeN base) 1030)`. Before the patch, the expression was matching with `operand indOffIN()`. But, for 64-bit `LDR/STR`, signed immediate byte offset must be in the range -256 to 255 or positive immediate byte offset must be a multiple of 8 in the range 0 to 32760 [[2]](https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/LDR--immediate---Load-Register--immediate--?lang=en). `1030` can't be encoded in the instruction field. So, after matching, when we do checking for instruction encoding, the assertion would fail. >> >> In this patch, we're going to filter out invalid immediates when deciding if current addressing mode can be matched as `base_plus_offset`. We introduce `indOffIN4/indOffLN4` and `indOffIN8/indOffLN8` for 32-bit data type and 64-bit data type separately in the patch. E.g., for `memory4`, we remove the generic `indOffIN/indOffLN`, which matches wrong unscaled immediate range, and replace them with `indOffIN4/indOffLN4` instead. >> >> Since 8-bit and 16-bit `LDR/STR` instructions also support the unscaled decoding pattern, we add the addressing mode in the lists of `memory1` and `memory2` by introducing `indOffIN1/indOffLN1` and `indOffIN2/indOffLN2`. >> >> We also remove unused operands `indOffI/indOffl/indOffIN/indOffLN` to avoid misuse. >> >> Tier 1-3 passed on aarch64. > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Remove unused immIOffset/immLOffset > - Merge branch 'master' into fg8319690 > - 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" > > On LP64 systems, if the heap can be moved into low virtual > address space (below 4GB) and the heap size is smaller than the > interesting threshold of 4 GB, we can use unscaled decoding > pattern for narrow klass decoding. It means that a generic field > reference can be decoded by: > ``` > cast<64> (32-bit compressed reference) + field_offset > ``` > > When the `field_offset` is an immediate, on aarch64 platform, the > unscaled decoding pattern can match perfectly with a direct > addressing mode, i.e., `base_plus_offset`, supported by LDR/STR > instructions. But for certain data width, not all immediates can > be encoded in the instruction field of LDR/STR[1]. The ranges are > different as data widths vary. > > For example, when we try to load a value of long type at offset of > `1030`, the address expression is `(AddP (DecodeN base) 1030)`. > Before the patch, the expression was matching with > `operand indOffIN()`. But, for 64-bit LDR/STR, signed immediate > byte offset must be in the range -256 to 255 or positive immediate > byte offset must be a multiple of 8 in the range 0 to 32760[2]. > `1030` can't be encoded in the instruction field. So, after > matching, when we do checking for instruction encoding, the > assertion would fail. > > In this patch, we're going to filter out invalid immediates > when deciding if current addressing mode can be matched as > `base_plus_offset`. We introduce `indOffIN4/indOffLN4` and > `indOffIN8/indOffLN8` for 32-bit data type and 64-bit data > type separately in the patch. E.g., for `memory4`, we remove > the generic `indOffIN/indOffLN`, which matches wrong unscaled > immediate range, and replace them with `indOffIN4/indOffLN4` > instead. > > Since 8-bit and 16-bit LDR/STR instructions also support the > unscaled decoding pattern, we add the addressing mode in the > lists of `memory1` and `memory2` by introducing > `indOffIN1/indOffLN1` and `indOffIN2/indOffLN2`. > > We also remove unused operands `indOffI/indOffl/indOffIN/indOffLN` > to avoid misuse. > > ... test/hotspot/jtreg/compiler/c2/aarch64/TestUnalignedAccessCompressedOops.java line 35: > 33: * @library /test/lib > 34: * @modules java.base/jdk.internal.misc > 35: * @requires os.arch=="aarch64" & vm.compiler2.enabled I would remove these two lines. Because who knows, maybe some other platform has similar issues down the road. Or maybe graalVM has a bug that we could catch with this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16991#discussion_r1613119786 From epeter at openjdk.org Fri May 24 09:12:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 24 May 2024 09:12:11 GMT Subject: RFR: 8327381: Refactor type-improving transformations in BoolNode::Ideal to BoolNode::Value [v9] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 17:33:29 GMT, Kangcheng Xu wrote: >> This PR resolves [JDK-8327381](https://bugs.openjdk.org/browse/JDK-8327381) >> >> Currently the transformations for expressions with patterns `((x & m) u<= m)` or `((m & x) u<= m)` to `true` is in `BoolNode::Ideal` function with a new constant node of value `1` created. However, this is technically a type-improving (reduction in range) transformation that's better suited in `BoolNode::Value` function. >> >> New unit test `test/hotspot/jtreg/compiler/c2/TestBoolNodeGvn.java` asserting on IR nodes and correctness of this transformation is added and passing. > > Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - Merge branch 'master' into boolnode-refactor > - refactor BoolNode::Value() and extract code to ::Value_cmpu_and_mask > - update comments > - fix indentation again > - apply test only on x64, aarch64 and riscv64 > - also renames the class name in @run > - update test @run annotation > - improve formatting, correct annotation and rename test class > - Merge branch 'master' into boolnode-refactor > - update the package name for tests > - ... and 6 more: https://git.openjdk.org/jdk/compare/6a68956e...278c436a VM code looks good, just a few comments about the test. Thanks for adding that one :) test/hotspot/jtreg/compiler/c2/irTests/TestBoolNodeGVN.java line 2: > 1: /* > 2: * Copyright (c) 2024 Red Hat and/or its affiliates. All rights reserved. Can you move this to a different directory, please? the `irTests` directory was a good idea when we were just starting with the IR framework, but now it makes more sense to sort tests by "what" is tested, rather than "how". Feel free to put it in the `c2` directory, or even create a new subdirectory like `c2/gvn`. test/hotspot/jtreg/compiler/c2/irTests/TestBoolNodeGVN.java line 32: > 30: import compiler.lib.ir_framework.IRNode; > 31: import compiler.lib.ir_framework.Test; > 32: import compiler.lib.ir_framework.TestFramework; Suggestion: import compiler.lib.ir_framework.*; I think that would be more concise. Up to you. test/hotspot/jtreg/compiler/c2/irTests/TestBoolNodeGVN.java line 62: > 60: Integer.compareUnsigned((x & m), m + 1) < 0 & > 61: Integer.compareUnsigned((m & x), m + 1) < 0; > 62: } I just had an idea: Can you create a few test-cases like `Integer.compareUnsigned((m & x), m + 2) < 0` etc, with IR rules where a `IRNode.CMP_U` is expected? This could just be a sanity check, and see that we have no "off-by-one" errors here. What do you think? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18198#pullrequestreview-2076307485 PR Review Comment: https://git.openjdk.org/jdk/pull/18198#discussion_r1613135415 PR Review Comment: https://git.openjdk.org/jdk/pull/18198#discussion_r1613140304 PR Review Comment: https://git.openjdk.org/jdk/pull/18198#discussion_r1613138930 From redestad at openjdk.org Fri May 24 10:29:05 2024 From: redestad at openjdk.org (Claes Redestad) Date: Fri, 24 May 2024 10:29:05 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v10] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 08:24:15 GMT, Adam Sotona wrote: >> Hi, >> During performance optimization work on Class-File API as JDK lambda generator we found some static initialization killers. >> One of them is `java.lang.classfile.Attributes` with tens of static fields initialized with individual attribute mappers, and common set of all mappers, and static map from attribute names to the mappers. >> >> I propose to turn all the static fields into lazy-initialized static methods and remove `PREDEFINED_ATTRIBUTES` and `standardAttribute(Utf8Entry name)` static mapping method from the `Attributes` API class. >> >> Please let me know your comments or objections and please review the [PR](https://github.com/openjdk/jdk/pull/19006) and [CSR](https://bugs.openjdk.org/browse/JDK-8331414), so we can make it into 23. >> >> Thank you, >> Adam > > Adam Sotona has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: > > - fixed jdeps.Dependencies > - Merge branch 'master' into JDK-8331291-attributes > - addressed CSR review comments > - fixed CompilationIDMapper does not allow multiple instances > - fixed tests > - fixed tests > - fixed tests > - updated LimitsTest > - Merge branch 'master' into JDK-8331291-attributes > > # Conflicts: > # test/jdk/jdk/classfile/SignaturesTest.java > - Merge branch 'master' into JDK-8331291-attributes > - ... and 6 more: https://git.openjdk.org/jdk/compare/239c1b33...37f7f63f Looks good after revisions. ------------- Marked as reviewed by redestad (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19006#pullrequestreview-2076508421 From mli at openjdk.org Fri May 24 11:43:05 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 24 May 2024 11:43:05 GMT Subject: RFR: 8320999: RISC-V: C2 RotateLeftV [v2] In-Reply-To: <-xClnQXcYdLw5tq_Kq4PtOSNEO13lOrR_vD-nIMCzGU=.66662885-a465-4f23-92df-8de5cc656309@github.com> References: <-xClnQXcYdLw5tq_Kq4PtOSNEO13lOrR_vD-nIMCzGU=.66662885-a465-4f23-92df-8de5cc656309@github.com> Message-ID: On Fri, 24 May 2024 07:39:08 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: >> >> - merge >> - Fix imm6 in vror.vi; misc >> - Merge branch 'master' into rotate-left-right-v >> - add comments >> - fix mask >> - fix imm & long >> - fixes >> - Merge branch 'master' into rotate-left-right-v >> - fixes >> - remove redundant code: UseZvbb >> - ... and 2 more: https://git.openjdk.org/jdk/compare/a0c5714d...edd0201d > > src/hotspot/cpu/riscv/riscv_v.ad line 3097: > >> 3095: %} >> 3096: >> 3097: instruct vrotate_right_imm(vReg dst, vReg src, immI shift) %{ > > Question: Could we make use of the vector-scalar rotate variants (vrol_vx / vror_vx) in case `shift` is not a constant? Do you mean have another instruct like `instruct vrotate_right_imm(vReg dst, vReg src, Reg shift)`? Seems not, as in both vectorization or Vector API implementation, when it's not const, it will be put into a vector first, then match `vrotate_right(vReg dst, vReg src, vReg shift)` > src/hotspot/cpu/riscv/riscv_v.ad line 3101: > >> 3099: Matcher::vector_element_basic_type(n) == T_SHORT || >> 3100: Matcher::vector_element_basic_type(n) == T_INT || >> 3101: Matcher::vector_element_basic_type(n) == T_LONG); > > I am not sure but do we really need this predicate? That's a piece of code need to be cleaned. Thanks for catching. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19325#discussion_r1613333361 PR Review Comment: https://git.openjdk.org/jdk/pull/19325#discussion_r1613334409 From mli at openjdk.org Fri May 24 11:49:31 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 24 May 2024 11:49:31 GMT Subject: RFR: 8320999: RISC-V: C2 RotateLeftV [v3] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > More detailed description is inline in the code. > Thanks Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: clean up ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19325/files - new: https://git.openjdk.org/jdk/pull/19325/files/edd0201d..2b295d6e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19325&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19325&range=01-02 Stats: 16 lines in 1 file changed: 0 ins; 16 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19325.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19325/head:pull/19325 PR: https://git.openjdk.org/jdk/pull/19325 From rcastanedalo at openjdk.org Fri May 24 12:01:14 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 24 May 2024 12:01:14 GMT Subject: RFR: 8332527: ZGC: generalize object cloning logic [v3] In-Reply-To: References: Message-ID: > This changeset generalize the logic to produce a runtime call to clone a class instance so that it can be shared by other collectors adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). The changeset moves the logic from `ZBarrierSetC2` to the GC-shared `BarrierSetC2` class and adds support for 32-bits platforms. > > #### Testing > > - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). > - tier4-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only). > - `compiler/arraycopy` tests (linux-x86-debug) with [an additional patch](https://github.com/openjdk/jdk/commit/ddcf777894e740b8e6ddbbf8821e82a173c23ef4) that implements cloning of large class instances with a runtime clone call rather than arraycopy when using G1 (to exercise the generalized logic on a 32-bits platform). Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Generalize 'clone_instance_in_runtime' to also handle reflective array clones (as the original code) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19311/files - new: https://git.openjdk.org/jdk/pull/19311/files/cf85edec..6f16bd75 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19311&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19311&range=01-02 Stats: 14 lines in 3 files changed: 3 ins; 3 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/19311.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19311/head:pull/19311 PR: https://git.openjdk.org/jdk/pull/19311 From rcastanedalo at openjdk.org Fri May 24 12:05:02 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 24 May 2024 12:05:02 GMT Subject: RFR: 8332527: ZGC: generalize object cloning logic [v2] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 15:08:07 GMT, Roberto Casta?eda Lozano wrote: > I propose to remove in this RFE the assumption that BarrierSetC2::clone_instance_in_runtime can only be called for instance cloning and limit the RFE to simply moving logic from ZBarrierSetC2 into BarrierSetC2 and adding support for 32-bits platforms, without any other changes to the existing logic. Done, @TobiHartmann and @xmas92 please re-review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19311#issuecomment-2129363332 From thartmann at openjdk.org Fri May 24 12:11:02 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 24 May 2024 12:11:02 GMT Subject: RFR: 8332527: ZGC: generalize object cloning logic [v3] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 12:01:14 GMT, Roberto Casta?eda Lozano wrote: >> This changeset generalize the logic to produce a runtime call to clone a class instance so that it can be shared by other collectors adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). The changeset moves the logic from `ZBarrierSetC2` to the GC-shared `BarrierSetC2` class and adds support for 32-bits platforms. >> >> #### Testing >> >> - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> - tier4-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only). >> - `compiler/arraycopy` tests (linux-x86-debug) with [an additional patch](https://github.com/openjdk/jdk/commit/ddcf777894e740b8e6ddbbf8821e82a173c23ef4) that implements cloning of large class instances with a runtime clone call rather than arraycopy when using G1 (to exercise the generalized logic on a 32-bits platform). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Generalize 'clone_instance_in_runtime' to also handle reflective array clones (as the original code) Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19311#pullrequestreview-2076743063 From aboldtch at openjdk.org Fri May 24 12:11:02 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 24 May 2024 12:11:02 GMT Subject: RFR: 8332527: ZGC: generalize object cloning logic [v3] In-Reply-To: References: Message-ID: <6cQKa_rKpwbv0Sw8HNYLOthKxk3pvESTbdfC8zyz2cM=.251f3feb-f049-4065-bb37-cf628f500b08@github.com> On Fri, 24 May 2024 12:01:14 GMT, Roberto Casta?eda Lozano wrote: >> This changeset generalize the logic to produce a runtime call to clone a class instance so that it can be shared by other collectors adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). The changeset moves the logic from `ZBarrierSetC2` to the GC-shared `BarrierSetC2` class and adds support for 32-bits platforms. >> >> #### Testing >> >> - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> - tier4-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only). >> - `compiler/arraycopy` tests (linux-x86-debug) with [an additional patch](https://github.com/openjdk/jdk/commit/ddcf777894e740b8e6ddbbf8821e82a173c23ef4) that implements cloning of large class instances with a runtime clone call rather than arraycopy when using G1 (to exercise the generalized logic on a 32-bits platform). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Generalize 'clone_instance_in_runtime' to also handle reflective array clones (as the original code) Marked as reviewed by aboldtch (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19311#pullrequestreview-2076738346 From mli at openjdk.org Fri May 24 12:14:11 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 24 May 2024 12:14:11 GMT Subject: RFR: 8332883: Some simple cleanup in vectornode.cpp Message-ID: <_OntRXQMobbozvu5_QPLpEny6Wsfv5pFQGYhWw8aSCE=.7389a53d-b139-4825-8fc6-e22e7220fe9e@github.com> Hi, Can you review this simple cleanup in vectornode.cpp? Thanks! ------------- Commit messages: - Initial commit Changes: https://git.openjdk.org/jdk/pull/19392/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19392&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332883 Stats: 17 lines in 1 file changed: 0 ins; 13 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19392.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19392/head:pull/19392 PR: https://git.openjdk.org/jdk/pull/19392 From fyang at openjdk.org Fri May 24 12:17:10 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 24 May 2024 12:17:10 GMT Subject: RFR: 8320999: RISC-V: C2 RotateLeftV [v2] In-Reply-To: References: <-xClnQXcYdLw5tq_Kq4PtOSNEO13lOrR_vD-nIMCzGU=.66662885-a465-4f23-92df-8de5cc656309@github.com> Message-ID: On Fri, 24 May 2024 11:39:20 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/riscv_v.ad line 3097: >> >>> 3095: %} >>> 3096: >>> 3097: instruct vrotate_right_imm(vReg dst, vReg src, immI shift) %{ >> >> Question: Could we make use of the vector-scalar rotate variants (vrol_vx / vror_vx) in case `shift` is not a constant? > > Do you mean have another instruct like `instruct vrotate_right_imm(vReg dst, vReg src, Reg shift)`? > Seems not, as in both vectorization or Vector API implementation, when it's not const, it will be put into a vector first, then match `vrotate_right(vReg dst, vReg src, vReg shift)` Yeah. If that is the case, maybe we can save one vector register then? I mean let `instruct vrotate_right_reg(vReg dst, vReg src, Reg shift)` match something like this: `match(Set dst (RotateRightV src (Replicate shift)))`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19325#discussion_r1613386212 From rcastanedalo at openjdk.org Fri May 24 12:22:03 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 24 May 2024 12:22:03 GMT Subject: RFR: 8332527: ZGC: generalize object cloning logic [v3] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 12:01:14 GMT, Roberto Casta?eda Lozano wrote: >> This changeset generalize the logic to produce a runtime call to clone a class instance so that it can be shared by other collectors adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). The changeset moves the logic from `ZBarrierSetC2` to the GC-shared `BarrierSetC2` class and adds support for 32-bits platforms. >> >> #### Testing >> >> - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> - tier4-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only). >> - `compiler/arraycopy` tests (linux-x86-debug) with [an additional patch](https://github.com/openjdk/jdk/commit/ddcf777894e740b8e6ddbbf8821e82a173c23ef4) that implements cloning of large class instances with a runtime clone call rather than arraycopy when using G1 (to exercise the generalized logic on a 32-bits platform). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Generalize 'clone_instance_in_runtime' to also handle reflective array clones (as the original code) Thanks again for reviewing, Tobias and Axel! I will integrate on Monday. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19311#issuecomment-2129399586 From asotona at openjdk.org Fri May 24 12:35:21 2024 From: asotona at openjdk.org (Adam Sotona) Date: Fri, 24 May 2024 12:35:21 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v11] In-Reply-To: References: Message-ID: <6bLm0FINPSxkp-WhBPd8tJfDDKEie5fPBe45oNU5qWU=.8a9b299a-5b2f-49af-8ea1-7045f5532e6e@github.com> > Hi, > During performance optimization work on Class-File API as JDK lambda generator we found some static initialization killers. > One of them is `java.lang.classfile.Attributes` with tens of static fields initialized with individual attribute mappers, and common set of all mappers, and static map from attribute names to the mappers. > > I propose to turn all the static fields into lazy-initialized static methods and remove `PREDEFINED_ATTRIBUTES` and `standardAttribute(Utf8Entry name)` static mapping method from the `Attributes` API class. > > Please let me know your comments or objections and please review the [PR](https://github.com/openjdk/jdk/pull/19006) and [CSR](https://bugs.openjdk.org/browse/JDK-8331414), so we can make it into 23. > > Thank you, > Adam Adam Sotona has updated the pull request incrementally with one additional commit since the last revision: fixed tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19006/files - new: https://git.openjdk.org/jdk/pull/19006/files/37f7f63f..db73c2dd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19006&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19006&range=09-10 Stats: 8 lines in 4 files changed: 1 ins; 2 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/19006.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19006/head:pull/19006 PR: https://git.openjdk.org/jdk/pull/19006 From mli at openjdk.org Fri May 24 13:20:02 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 24 May 2024 13:20:02 GMT Subject: RFR: 8320999: RISC-V: C2 RotateLeftV [v2] In-Reply-To: References: <-xClnQXcYdLw5tq_Kq4PtOSNEO13lOrR_vD-nIMCzGU=.66662885-a465-4f23-92df-8de5cc656309@github.com> Message-ID: <1guRcC9sUUBUCRGW1ku3r8dZerahN2V8Eig5lodyUH4=.7e8607c9-6c97-4224-8db7-c063a359ea52@github.com> On Fri, 24 May 2024 12:14:44 GMT, Fei Yang wrote: >> Do you mean have another instruct like `instruct vrotate_right_imm(vReg dst, vReg src, Reg shift)`? >> Seems not, as in both vectorization or Vector API implementation, when it's not const, it will be put into a vector first, then match `vrotate_right(vReg dst, vReg src, vReg shift)` > > Yeah. If that is the case, maybe we can save one vector register then? > I mean let `instruct vrotate_right_reg(vReg dst, vReg src, Reg shift)` match something like this: > `match(Set dst (RotateRightV src (Replicate shift)))`. Not sure, could be. If this is the case, then the vecotr shift should be optimized too? I check the code generated, seems we're fine? 0x00002aaac560c55a: vmv.v.x v1,a3 ... ... 0x00002aaac560c594: vle32.v v2,(a4) 0x00002aaac560c598: vsetivli t0,8,e32,m1,tu,mu 0x00002aaac560c59c: vror.vv v2,v2,v1 In any way, we need 2 v register's? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19325#discussion_r1613469134 From dfenacci at openjdk.org Fri May 24 13:42:13 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 24 May 2024 13:42:13 GMT Subject: Integrated: 8325520: Vector loads and stores with indices and masks incorrectly compiled In-Reply-To: References: Message-ID: On Mon, 18 Mar 2024 12:20:34 GMT, Damon Fenacci wrote: > # Issue > When loading multiple vectors using indices or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, indices, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. > > # Causes > On vector-capable platforms, vector loads with masks and indices (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or indices are mapped as `LoadVector` nodes instead. > The same is true for `StoreVector`s. > When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 > > where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor indices interfere. > Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > but we don?t make sure that there are no masks or indices. > A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and indices here too but in this case we can include these cases if the masks and indices of the vector stores are equivalent. > > # Solution > To avoid folding `Load`- and `StoreVector`s with masks and indices we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns -1. In this way, the checks in `MemNode::can_see_stored_value` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 > > and `StoreNode::Identity` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > will fail if masks or indices are used. > For 2 stores of the same value we instead check for mask and indices equality. > > Regression tests for all versions of `Load/StoreVectorGather/Masked` have been added too. This pull request has now been integrated. Changeset: 0c934ff4 Author: Damon Fenacci URL: https://git.openjdk.org/jdk/commit/0c934ff4e2fb53a72ad25a080d956745a5649f9b Stats: 1465 lines in 5 files changed: 1463 ins; 0 del; 2 mod 8325520: Vector loads and stores with indices and masks incorrectly compiled Reviewed-by: epeter, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/18347 From sgibbons at openjdk.org Fri May 24 13:44:28 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 13:44:28 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v36] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Missing comma ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/c034d3f9..1a71eb10 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=35 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=34-35 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sgibbons at openjdk.org Fri May 24 13:44:28 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 13:44:28 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v33] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 06:31:36 GMT, Daniel Jeli?ski wrote: >> Thanks for finding this. It was ignorance on my part as I thought the xorq would have logic to not emit the REX prefix if not necessary, but it doesn't. Fixed. > > Right, it seems to surprise people. There's a lot of preexisting uses of xorq / xorptr to zero a register. I think it would make sense to implement this logic in xorq. I can do this if others agree. Good idea. I vote yes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613506958 From sgibbons at openjdk.org Fri May 24 13:44:29 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 13:44:29 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v35] In-Reply-To: References: <-vyOZzeMslZqgJpTsQnnOWi4abWiM8fNeWSVx5LEHm8=.d37011ee-102c-4874-aa26-d113949d25ea@github.com> Message-ID: On Fri, 24 May 2024 00:47:04 GMT, Vladimir Kozlov wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments - move stubGen*_string.cpp to c2_stubGen*_string.cpp > > test/jdk/java/lang/StringBuffer/IndexOf.java line 2: > >> 1: /* >> 2: * Copyright (c) 2000, 2024 Oracle and/or its affiliates. All rights reserved. > > This copyright header validation failure. Missing comma `,` after 2024. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613504949 From sgibbons at openjdk.org Fri May 24 14:22:11 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 14:22:11 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v33] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 06:31:40 GMT, Daniel Jeli?ski wrote: >> It may, but I believe the movq is shorter (although maybe not to r15). I'll do some experimentation. > > the RIP-relative lea should have a shorter encoding. I think something like `lea(r15, ExternalAddress(small_jump_table))` should produce it (untested) Just did the experiment and it turns out that `mov64(r15, (int64_t)small_jump_table)` and `lea(r15, ExternalAddress(small_jump_table))` produce exactly the same code: `0x00007fffe463d68b: 49 bf a0 d5 63 e4 ff 7f 00 00 movabs r15,0x7fffe463d5a0` The code in `MacroAssembler` for `lea` calls `mov_literal64` with no check for whether it can be ip-relative. I tried doing it myself via `leaq(r15, Address(rip, (int64_t)small_jump_table - (int64_t)(__ pc())))` but there is no definition in `register_x86.hpp` for register `rip`. So I'm not sure exactly how to produce RIP-relative addressing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613560044 From fyang at openjdk.org Fri May 24 14:51:01 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 24 May 2024 14:51:01 GMT Subject: RFR: 8320999: RISC-V: C2 RotateLeftV [v2] In-Reply-To: <1guRcC9sUUBUCRGW1ku3r8dZerahN2V8Eig5lodyUH4=.7e8607c9-6c97-4224-8db7-c063a359ea52@github.com> References: <-xClnQXcYdLw5tq_Kq4PtOSNEO13lOrR_vD-nIMCzGU=.66662885-a465-4f23-92df-8de5cc656309@github.com> <1guRcC9sUUBUCRGW1ku3r8dZerahN2V8Eig5lodyUH4=.7e8607c9-6c97-4224-8db7-c063a359ea52@github.com> Message-ID: On Fri, 24 May 2024 13:15:37 GMT, Hamlin Li wrote: >> Yeah. If that is the case, maybe we can save one vector register then? >> I mean let `instruct vrotate_right_reg(vReg dst, vReg src, Reg shift)` match something like this: >> `match(Set dst (RotateRightV src (Replicate shift)))`. > > Not sure, could be. If this is the case, then the vecotr shift should be optimized too? > > I check the code generated, seems we're fine? > > 0x00002aaac560c55a: vmv.v.x v1,a3 > ... ... > 0x00002aaac560c594: vle32.v v2,(a4) > 0x00002aaac560c598: vsetivli t0,8,e32,m1,tu,mu > 0x00002aaac560c59c: vror.vv v2,v2,v1 > > > In any way, we need 2 v register's? Yes, I think there should be quite a few places where we could make use of vector-scalar variants, which would save us one vector register. @zifeihan has already handle some cases in vector logic instructions: https://github.com/openjdk/jdk/pull/18999. And He is currently working on handling more vector arithmetic instructions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19325#discussion_r1613602729 From djelinski at openjdk.org Fri May 24 14:52:12 2024 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Fri, 24 May 2024 14:52:12 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v33] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 14:19:13 GMT, Scott Gibbons wrote: >> the RIP-relative lea should have a shorter encoding. I think something like `lea(r15, ExternalAddress(small_jump_table))` should produce it (untested) > > Just did the experiment and it turns out that `mov64(r15, (int64_t)small_jump_table)` and `lea(r15, ExternalAddress(small_jump_table))` produce exactly the same code: > > `0x00007fffe463d68b: 49 bf a0 d5 63 e4 ff 7f 00 00 movabs r15,0x7fffe463d5a0` > > The code in `MacroAssembler` for `lea` calls `mov_literal64` with no check for whether it can be ip-relative. > > I tried doing it myself via `leaq(r15, Address(rip, (int64_t)small_jump_table - (int64_t)(__ pc())))` but there is no definition in `register_x86.hpp` for register `rip`. So I'm not sure exactly how to produce RIP-relative addressing. Thanks for checking. Well I know that the `MacroAssembler::movdqu(XMMRegister dst, AddressLiteral src, Register rscratch)` method actually generates rip-relative addresses. Maybe we could copy some of that code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613603833 From yzheng at openjdk.org Fri May 24 14:55:08 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Fri, 24 May 2024 14:55:08 GMT Subject: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v6] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 10:12:17 GMT, Bhavana Kilambi wrote: >> Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: >> >> address comments. > > src/hotspot/share/opto/library_call.cpp line 5925: > >> 5923: // Set the original stack and the reexecute bit for the interpreter to reexecute >> 5924: // the bytecode that invokes BigInteger.multiplyToLen() if deoptimization happens >> 5925: // on the return from z array allocation in runtime. > > Since we are not allocating z array during runtime anymore, do we still need these comments? Thanks for pointing it out! Removed. > src/java.base/share/classes/java/math/BigInteger.java line 1836: > >> 1834: >> 1835: if (z == null || z.length < (xlen + ylen)) >> 1836: z = new int[xlen + ylen]; > > Style: only 4 spaces indentation Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1613608426 PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1613608653 From eastigeevich at openjdk.org Fri May 24 14:56:10 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Fri, 24 May 2024 14:56:10 GMT Subject: RFR: 8332632: Redundant assert "compiler should always document failure: %s" with possible UB Message-ID: [JDK-8303951](https://bugs.openjdk.org/browse/JDK-8303951) added the following code: https://github.com/openjdk/jdk/pull/13038/files#diff-2e74481e557cbe87170a56a6e592eea33bb59019926e1c32bebcfaf5b571bb53R2280 if (!ci_env.failing() && !task->is_success()) { + assert(ci_env.failure_reason() != nullptr, "expect failure reason"); + assert(false, "compiler should always document failure: %s", ci_env.failure_reason()); The second assert is redundant because `ci_env.failure_reason() != nullptr` is always `false`. It also has possible UB. A compiler sees if-statement checking `!ci_env.failing() ` which, if it is true, implies `ci_env.failure_reason()` is `nullptr`. Based on this information the compiler can optimize `assert(ci_env.failure_reason() != nullptr, "expect failure reason"); ` to `assert(false, "expect failure reason"); `. The compiler can optimize `assert(false, "compiler should always document failure: %s", ci_env.failure_reason()); ` to `assert(false, "compiler should always document failure: %s", nullptr); `. So the original code would be like the following: if (!ci_env.failing() && !task->is_success()) { assert(false, "expect failure reason"); assert(false, "compiler should always document failure: %s", nullptr); } We have an expression where a format string is used. Format strings usually have undefined behavior if `nullptr` is passed for the character string format specifier. See `std::printf` for example. Even the second assert is never executed, it makes the IF-block to have UB. The C++ standard says: correct C++ programs are free of undefined behavior. See https://en.cppreference.com/w/cpp/language/ub and https://en.cppreference.com/w/cpp/language/as_if Choosing between readability and correctness, I choose correctness. I think the one assert `assert(ci_env.failure_reason() != nullptr, "compiler should always document failure"); ` meets being self-documented and correct. ------------- Commit messages: - 8332632: Redundant assert "compiler should always document failure: %s" with possible UB Changes: https://git.openjdk.org/jdk/pull/19395/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19395&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332632 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19395.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19395/head:pull/19395 PR: https://git.openjdk.org/jdk/pull/19395 From luhenry at openjdk.org Fri May 24 14:57:04 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 24 May 2024 14:57:04 GMT Subject: RFR: 8320999: RISC-V: C2 RotateLeftV [v2] In-Reply-To: References: <-xClnQXcYdLw5tq_Kq4PtOSNEO13lOrR_vD-nIMCzGU=.66662885-a465-4f23-92df-8de5cc656309@github.com> <1guRcC9sUUBUCRGW1ku3r8dZerahN2V8Eig5lodyUH4=.7e8607c9-6c97-4224-8db7-c063a359ea52@github.com> Message-ID: On Fri, 24 May 2024 14:48:24 GMT, Fei Yang wrote: >> Not sure, could be. If this is the case, then the vecotr shift should be optimized too? >> >> I check the code generated, seems we're fine? >> >> 0x00002aaac560c55a: vmv.v.x v1,a3 >> ... ... >> 0x00002aaac560c594: vle32.v v2,(a4) >> 0x00002aaac560c598: vsetivli t0,8,e32,m1,tu,mu >> 0x00002aaac560c59c: vror.vv v2,v2,v1 >> >> >> In any way, we need 2 v register's? > > Yes, I think there should be quite a few places where we could make use of vector-scalar variants, which would save us one vector register. @zifeihan has already handle some cases in vector logic instructions: https://github.com/openjdk/jdk/pull/18999. And He is currently working on handling more vector arithmetic instructions. > > (One example: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/riscv_v.ad#L523) I would also favor using `.vi` or `.vx` variants over `.vv` variants where possible. This would reduce the vector register pressure and remove an unnecessary instruction. @Hamlin-Li in your example, we could instead have: ... ... 0x00002aaac560c594: vle32.v v2,(a4) 0x00002aaac560c598: vsetivli t0,8,e32,m1,tu,mu 0x00002aaac560c59c: vror.vx v2,v2,a3 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19325#discussion_r1613609262 From fyang at openjdk.org Fri May 24 15:00:03 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 24 May 2024 15:00:03 GMT Subject: RFR: 8320999: RISC-V: C2 RotateLeftV [v2] In-Reply-To: References: <-xClnQXcYdLw5tq_Kq4PtOSNEO13lOrR_vD-nIMCzGU=.66662885-a465-4f23-92df-8de5cc656309@github.com> <1guRcC9sUUBUCRGW1ku3r8dZerahN2V8Eig5lodyUH4=.7e8607c9-6c97-4224-8db7-c063a359ea52@github.com> Message-ID: On Fri, 24 May 2024 14:53:15 GMT, Ludovic Henry wrote: >> Yes, I think there should be quite a few places where we could make use of vector-scalar variants, which would save us one vector register and one vmv.v.x instruction. @zifeihan has already handle some cases in vector logic instructions: https://github.com/openjdk/jdk/pull/18999. And He is currently working on handling more vector arithmetic instructions. >> >> (One example: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/riscv_v.ad#L523) > > I would also favor using `.vi` or `.vx` variants over `.vv` variants where possible. This would reduce the vector register pressure and remove an unnecessary instruction. > > @Hamlin-Li in your example, we could instead have: > > ... ... > 0x00002aaac560c594: vle32.v v2,(a4) > 0x00002aaac560c598: vsetivli t0,8,e32,m1,tu,mu > 0x00002aaac560c59c: vror.vx v2,v2,a3 And for your case, this would help save the `vmv.v.x v1,a3` instruction if you do `vror.vv v2,v2,a3` instead of `vror.vv v2,v2,v1`. Right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19325#discussion_r1613614979 From rrich at openjdk.org Fri May 24 15:08:03 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 24 May 2024 15:08:03 GMT Subject: RFR: 8331311: C2: Big Endian Port of 8318446: optimize stores into primitive arrays by combining values into larger store [v4] In-Reply-To: <7W-vxm7KC8qwd-GJAPh4TCtDhOzw7X5-gXanLudP27Y=.807f809f-92ce-498f-94c4-49b0405bbb6f@github.com> References: <7W-vxm7KC8qwd-GJAPh4TCtDhOzw7X5-gXanLudP27Y=.807f809f-92ce-498f-94c4-49b0405bbb6f@github.com> Message-ID: On Thu, 16 May 2024 05:29:58 GMT, Richard Reingruber wrote: >> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: >> >> Eliminate IS_BIG_ENDIAN and always execute both variants > > Test error is unrelated to the changes. Upload of test results failed: > `Error: Failed to CreateArtifact: Failed to make request after 5 attempts: Request timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact` > @reinrich please ping me again to ask if testing is ok before you integrate ;) Thanks for picking this up again. I quickly wanted to let you know that I'm out of office. I will be back in a week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19218#issuecomment-2129767944 From yzheng at openjdk.org Fri May 24 15:12:28 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Fri, 24 May 2024 15:12:28 GMT Subject: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v7] In-Reply-To: References: Message-ID: <9FFOmfnJsAIg1KJN0RcpDmAzpn68k4QBvFifeazLjmc=.dc821eea-3423-4a34-bcfc-217183169352@github.com> > Moving array construction within BigInteger.implMultiplyToLen intrinsic candidate to its caller simplifies the intrinsic implementation in JIT compiler. Yudi Zheng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into JDK-8327964 - address comments. - address comments. - address comment. - address comment. - address comment. - address comment. - Simplify BigInteger.implMultiplyToLen intrinsic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18226/files - new: https://git.openjdk.org/jdk/pull/18226/files/7c6023f8..c719e0a9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18226&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18226&range=05-06 Stats: 560567 lines in 6784 files changed: 132593 ins; 81763 del; 346211 mod Patch: https://git.openjdk.org/jdk/pull/18226.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18226/head:pull/18226 PR: https://git.openjdk.org/jdk/pull/18226 From kvn at openjdk.org Fri May 24 15:25:11 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 24 May 2024 15:25:11 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v33] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 14:49:05 GMT, Daniel Jeli?ski wrote: >> Just did the experiment and it turns out that `mov64(r15, (int64_t)small_jump_table)` and `lea(r15, ExternalAddress(small_jump_table))` produce exactly the same code: >> >> `0x00007fffe463d68b: 49 bf a0 d5 63 e4 ff 7f 00 00 movabs r15,0x7fffe463d5a0` >> >> The code in `MacroAssembler` for `lea` calls `mov_literal64` with no check for whether it can be ip-relative. >> >> I tried doing it myself via `leaq(r15, Address(rip, (int64_t)small_jump_table - (int64_t)(__ pc())))` but there is no definition in `register_x86.hpp` for register `rip`. So I'm not sure exactly how to produce RIP-relative addressing. > > Thanks for checking. Well I know that the `MacroAssembler::movdqu(XMMRegister dst, AddressLiteral src, Register rscratch)` method actually generates rip-relative addresses. Maybe we could copy some of that code. Use `lea` and `InternalAddress()` for referencing jump tables since the addresses are in the same code section. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613648648 From sgibbons at openjdk.org Fri May 24 15:32:26 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 15:32:26 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v37] In-Reply-To: References: Message-ID: <4xYUBsOJ_eDSuj6w9AjUo_6gFN_9piWR-ChLrHQoXl4=.88756684-8e9c-48e3-8b59-f5f684b81cde@github.com> > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: mov64 => lea(InternalAddress) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/1a71eb10..5d10a20b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=36 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=35-36 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sgibbons at openjdk.org Fri May 24 15:36:12 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 15:36:12 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v33] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 14:49:05 GMT, Daniel Jeli?ski wrote: >> Just did the experiment and it turns out that `mov64(r15, (int64_t)small_jump_table)` and `lea(r15, ExternalAddress(small_jump_table))` produce exactly the same code: >> >> `0x00007fffe463d68b: 49 bf a0 d5 63 e4 ff 7f 00 00 movabs r15,0x7fffe463d5a0` >> >> The code in `MacroAssembler` for `lea` calls `mov_literal64` with no check for whether it can be ip-relative. >> >> I tried doing it myself via `leaq(r15, Address(rip, (int64_t)small_jump_table - (int64_t)(__ pc())))` but there is no definition in `register_x86.hpp` for register `rip`. So I'm not sure exactly how to produce RIP-relative addressing. > > Thanks for checking. Well I know that the `MacroAssembler::movdqu(XMMRegister dst, AddressLiteral src, Register rscratch)` method actually generates rip-relative addresses. Maybe we could copy some of that code. Changed to `lea` with `InternalAddress()`. Generates the exact same code, but makes more sense. I looked at `movdqu` and see no code that generates RIP-relative loads. It merely checks `reachable()` and adds an intermediate `lea` if not reachable. @djelinski can you clarify please? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613665756 From duke at openjdk.org Fri May 24 15:56:14 2024 From: duke at openjdk.org (Tobias Hotz) Date: Fri, 24 May 2024 15:56:14 GMT Subject: RFR: 8332856: C2: Add new transform for bool eq/ne (cmp (and (urshift X const1) const2) 0) Message-ID: This PR adds a new ideal optimization for the following pattern: public boolean testFunc(int a) { int mask = 0b101; int shift = 12; return ((a >> shift) & mask) == 0; } Where the mask and shift are constant values and a is a variable. For this optimization to work, the right shift has to be idealized to a unsinged right shift earlier in the pipeline, which here: https://github.com/openjdk/jdk/blob/b92bd671835c37cff58e2cdcecd0fe4277557d7f/src/hotspot/share/opto/mulnode.cpp#L731 If the shift is already an unsiged bit shift, it works as well. On AMD64 CPUs, this means that this whole line computation can be reduced to a simple `test` instruction. ------------- Commit messages: - Add summary and bugid to test - remove trailing whitespace in test - force LF line endings for test - Add ideal for bool eq/ne (cmp (and (urshift X const1) const2) 0) -> bool eq/ne (cmp (and X newconst) 0) Changes: https://git.openjdk.org/jdk/pull/19310/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19310&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332856 Stats: 197 lines in 2 files changed: 197 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19310.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19310/head:pull/19310 PR: https://git.openjdk.org/jdk/pull/19310 From asotona at openjdk.org Fri May 24 16:01:09 2024 From: asotona at openjdk.org (Adam Sotona) Date: Fri, 24 May 2024 16:01:09 GMT Subject: Integrated: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 18:48:53 GMT, Adam Sotona wrote: > Hi, > During performance optimization work on Class-File API as JDK lambda generator we found some static initialization killers. > One of them is `java.lang.classfile.Attributes` with tens of static fields initialized with individual attribute mappers, and common set of all mappers, and static map from attribute names to the mappers. > > I propose to turn all the static fields into lazy-initialized static methods and remove `PREDEFINED_ATTRIBUTES` and `standardAttribute(Utf8Entry name)` static mapping method from the `Attributes` API class. > > Please let me know your comments or objections and please review the [PR](https://github.com/openjdk/jdk/pull/19006) and [CSR](https://bugs.openjdk.org/browse/JDK-8331414), so we can make it into 23. > > Thank you, > Adam This pull request has now been integrated. Changeset: cfdc64fc Author: Adam Sotona URL: https://git.openjdk.org/jdk/commit/cfdc64fcb43e3b261dddc6cc6947235a9e76154e Stats: 2285 lines in 149 files changed: 961 ins; 615 del; 709 mod 8331291: java.lang.classfile.Attributes class performs a lot of static initializations Reviewed-by: liach, redestad, vromero ------------- PR: https://git.openjdk.org/jdk/pull/19006 From scott.gibbons at intel.com Fri May 24 16:01:13 2024 From: scott.gibbons at intel.com (Gibbons, Scott) Date: Fri, 24 May 2024 16:01:13 +0000 Subject: Help with intrinsic testing for String.indexOf() Message-ID: Hi. I wrote a stub for implementing the indexOf method and am looking for a way to thoroughly test it. I have good tests for both positive and negative functionality that I'm pretty confident in. What I'm looking for is a good way to write a testcase to validate that I am not accessing memory outside the range of the strings passed to the stub. What I'd like? to do is to allocate an isolated page of memory such that accesses outside the page would cause a SIGSEGV. I would like to allocate the string within the page such that the last character of the string is at the end of the page, and also allocate a string at the beginning of the page. That way I could get clear indications of reading either past the end of the string or before the beginning (I know there's a header, so the header would be allocated at the beginning). Is there any method at all I can use within a Java testcase to effect such behavior? I've carefully performed code inspection and am relatively confident that I'm staying within the bounds of the strings but would like HW verification that I haven't missed anything. Ideas? Thanks, --Scott Gibbons Software Development Engineer, Runtime Engineering [cid:916a9f87-078f-42b1-ba53-c90320614209] DEVELOPER SOFTWARE ENGINEERING Ph: 1-503-456-7756 Cell: 1-469-450-8390 Intel JF1, 2111 NE 25th Ave Hillsboro, OR 97124 Intel Corporation | www.intel.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Outlook-rx5s21kd Type: image/jpg Size: 1250 bytes Desc: Outlook-rx5s21kd URL: From cslucas at openjdk.org Fri May 24 16:43:06 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 24 May 2024 16:43:06 GMT Subject: RFR: 8332883: Some simple cleanup in vectornode.cpp In-Reply-To: <_OntRXQMobbozvu5_QPLpEny6Wsfv5pFQGYhWw8aSCE=.7389a53d-b139-4825-8fc6-e22e7220fe9e@github.com> References: <_OntRXQMobbozvu5_QPLpEny6Wsfv5pFQGYhWw8aSCE=.7389a53d-b139-4825-8fc6-e22e7220fe9e@github.com> Message-ID: On Fri, 24 May 2024 11:58:18 GMT, Hamlin Li wrote: > Hi, > Can you review this simple cleanup in vectornode.cpp? > Thanks! LGTM ------------- PR Comment: https://git.openjdk.org/jdk/pull/19392#issuecomment-2129970247 From kvn at openjdk.org Fri May 24 18:02:14 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 24 May 2024 18:02:14 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v37] In-Reply-To: <4xYUBsOJ_eDSuj6w9AjUo_6gFN_9piWR-ChLrHQoXl4=.88756684-8e9c-48e3-8b59-f5f684b81cde@github.com> References: <4xYUBsOJ_eDSuj6w9AjUo_6gFN_9piWR-ChLrHQoXl4=.88756684-8e9c-48e3-8b59-f5f684b81cde@github.com> Message-ID: On Fri, 24 May 2024 15:32:26 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > mov64 => lea(InternalAddress) My testing for v34 passed without new failures. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2130096346 From kvn at openjdk.org Fri May 24 18:16:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 24 May 2024 18:16:13 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v37] In-Reply-To: <4xYUBsOJ_eDSuj6w9AjUo_6gFN_9piWR-ChLrHQoXl4=.88756684-8e9c-48e3-8b59-f5f684b81cde@github.com> References: <4xYUBsOJ_eDSuj6w9AjUo_6gFN_9piWR-ChLrHQoXl4=.88756684-8e9c-48e3-8b59-f5f684b81cde@github.com> Message-ID: On Fri, 24 May 2024 15:32:26 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > mov64 => lea(InternalAddress) I am fine with current version. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16753#pullrequestreview-2077568604 From sgibbons at openjdk.org Fri May 24 18:16:14 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 18:16:14 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v37] In-Reply-To: References: <4xYUBsOJ_eDSuj6w9AjUo_6gFN_9piWR-ChLrHQoXl4=.88756684-8e9c-48e3-8b59-f5f684b81cde@github.com> Message-ID: On Fri, 24 May 2024 17:59:49 GMT, Vladimir Kozlov wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> mov64 => lea(InternalAddress) > > My testing for v34 passed without new failures. Thank you @vnkozlov . Waiting for review from @sviswa7 and @jatin-bhateja, then I'll integrate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2130114623 From duke at openjdk.org Fri May 24 18:30:14 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 24 May 2024 18:30:14 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 14:41:36 GMT, Scott Gibbons wrote: >> test/micro/org/openjdk/bench/java/lang/StringIndexOfHuge.java line 132: >> >>> 130: @Benchmark >>> 131: public int searchHugeLargeSubstring() { >>> 132: return dataStringHuge.indexOf("B".repeat(30) + "X" + "A".repeat(30), 74); >> >> .repeat() call and string concatenation shouldn't be part of the benchmark (here and several other @Benchmark functions in this file) since it will detract from the measurement. >> >> (String concatenation gets converted (by javac) into StringBuilder().append().append()....append().toString()) > > Since we're only concerned with the delta of performance, does this really matter? Can you suggest an alternative? The needle really should be like the all the other strings, e.g. `dataStringHuge` itself, generated by the setup. As to weather it really matters; the answer is Amdahl's law. You can indeed measure the delta, but you can't measure the speedup of just the indexOf; not with repeat and concatenation obscuring the numbers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613864094 From duke at openjdk.org Fri May 24 18:35:11 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 24 May 2024 18:35:11 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 23:59:05 GMT, Scott Gibbons wrote: >> test/jdk/java/lang/StringBuffer/IndexOf.java line 40: >> >>> 38: private static boolean failure = false; >>> 39: public static void main(String[] args) throws Exception { >>> 40: String testName = "IndexOf"; >> >> intentation > > Fixed (missed a `git add`? don't see the updates for this file) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613870558 From kvn at openjdk.org Fri May 24 18:36:06 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 24 May 2024 18:36:06 GMT Subject: RFR: 8332632: Redundant assert "compiler should always document failure: %s" with possible UB In-Reply-To: References: Message-ID: On Fri, 24 May 2024 14:27:13 GMT, Evgeny Astigeevich wrote: > [JDK-8303951](https://bugs.openjdk.org/browse/JDK-8303951) added the following code: https://github.com/openjdk/jdk/pull/13038/files#diff-2e74481e557cbe87170a56a6e592eea33bb59019926e1c32bebcfaf5b571bb53R2280 > > > if (!ci_env.failing() && !task->is_success()) { > + assert(ci_env.failure_reason() != nullptr, "expect failure reason"); > + assert(false, "compiler should always document failure: %s", ci_env.failure_reason()); > > > The second assert is redundant because `ci_env.failure_reason() != nullptr` is always `false`. It also has possible UB. > > A compiler sees if-statement checking `!ci_env.failing() ` which, if it is true, implies `ci_env.failure_reason()` is `nullptr`. > > Based on this information the compiler can optimize `assert(ci_env.failure_reason() != nullptr, "expect failure reason"); ` to > `assert(false, "expect failure reason"); `. > The compiler can optimize `assert(false, "compiler should always document failure: %s", ci_env.failure_reason()); ` to `assert(false, "compiler should always document failure: %s", nullptr); `. > > So the original code would be like the following: > > > if (!ci_env.failing() && !task->is_success()) { > assert(false, "expect failure reason"); > assert(false, "compiler should always document failure: %s", nullptr); > } > > > We have an expression where a format string is used. Format strings usually have undefined behavior if `nullptr` is passed for the character string format specifier. See `std::printf` for example. > > Even the second assert is never executed, it makes the IF-block to have UB. The C++ standard says: correct C++ programs are free of undefined behavior. See https://en.cppreference.com/w/cpp/language/ub and https://en.cppreference.com/w/cpp/language/as_if > > Choosing between readability and correctness, I choose correctness. > > I think the one assert `assert(ci_env.failure_reason() != nullptr, "compiler should always document failure"); ` meets being self-documented and correct. I think it is typo. This code wants to check and print in these asserts `task->_failure_reason` which should be set when `!task->is_success()`. We do miss public accessors to `CompileTask::_failure_reason` which have to be added. And may be we should pass the failure to ciEnv too in this code instead of "compile failed". ------------- Changes requested by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19395#pullrequestreview-2077603878 From kvn at openjdk.org Fri May 24 18:40:12 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 24 May 2024 18:40:12 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v33] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 15:33:46 GMT, Scott Gibbons wrote: >> Thanks for checking. Well I know that the `MacroAssembler::movdqu(XMMRegister dst, AddressLiteral src, Register rscratch)` method actually generates rip-relative addresses. Maybe we could copy some of that code. > > Changed to `lea` with `InternalAddress()`. Generates the exact same code, but makes more sense. I looked at `movdqu` and see no code that generates RIP-relative loads. It merely checks `reachable()` and adds an intermediate `lea` if not reachable. @djelinski can you clarify please? I think HotSpot prefer to have full addresses in `lea` for possible patching. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613874603 From kvn at openjdk.org Fri May 24 19:30:04 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 24 May 2024 19:30:04 GMT Subject: RFR: 8332883: Some simple cleanup in vectornode.cpp In-Reply-To: <_OntRXQMobbozvu5_QPLpEny6Wsfv5pFQGYhWw8aSCE=.7389a53d-b139-4825-8fc6-e22e7220fe9e@github.com> References: <_OntRXQMobbozvu5_QPLpEny6Wsfv5pFQGYhWw8aSCE=.7389a53d-b139-4825-8fc6-e22e7220fe9e@github.com> Message-ID: On Fri, 24 May 2024 11:58:18 GMT, Hamlin Li wrote: > Hi, > Can you review this simple cleanup in vectornode.cpp? > Thanks! Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19392#pullrequestreview-2077717114 From sgibbons at openjdk.org Fri May 24 19:55:40 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 19:55:40 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v38] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Test clarifications ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/5d10a20b..485d02fd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=37 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=36-37 Stats: 69 lines in 2 files changed: 16 ins; 10 del; 43 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sgibbons at openjdk.org Fri May 24 19:55:40 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 19:55:40 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 18:32:53 GMT, Volodymyr Paprotski wrote: >> Fixed > > (missed a `git add`? don't see the updates for this file) Hmmm... Not sure what happened. >> Since we're only concerned with the delta of performance, does this really matter? Can you suggest an alternative? > > The needle really should be like the all the other strings, e.g. `dataStringHuge` itself, generated by the setup. > > As to weather it really matters; the answer is Amdahl's law. You can indeed measure the delta, but you can't measure the speedup of just the indexOf; not with repeat and concatenation obscuring the numbers. I have to believe that any relatively smart compiler would recognize that as a compile-time constant and make the change irrelevant. I've yielded to your desire and changed the code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613956309 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613955264 From sgibbons at openjdk.org Fri May 24 19:55:43 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 19:55:43 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v18] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 19:41:58 GMT, Volodymyr Paprotski wrote: >> Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 50 commits: >> >> - Merge remote-tracking branch 'origin/master' into indexof >> - Move arrays_equals back to c2_MacroAssembler >> - Merge branch 'openjdk:master' into indexof >> - Remove infinite loop (used for debugging) >> - Merge branch 'openjdk:master' into indexof >> - Cleaned up, ready for review >> - Pre-cleanup code >> - Add JMH. Add 16-byte compares to arrays_equals >> - Better method for mask creation >> - Merge branch 'openjdk:master' into indexof >> - ... and 40 more: https://git.openjdk.org/jdk/compare/b20fa7b4...f52d281d > > test/jdk/java/lang/StringBuffer/IndexOf.java line 81: > >> 79: String shs = (new String((hs_charset == StandardCharsets.UTF_16) ? haystack_16 : haystack)).substring(0, haystackSize); >> 80: >> 81: shs = "$&),,18+-!'8)+"; > > Should really keep the original test unmodified and add new tests as needed The test functionality was not changed. I just added printing of information when a failure occurs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613914184 From sgibbons at openjdk.org Fri May 24 19:55:43 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 19:55:43 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 19:34:40 GMT, Volodymyr Paprotski wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Rearrange; add lambdas for clarity > > test/jdk/java/lang/StringBuffer/IndexOf.java line 90: > >> 88: >> 89: // printStringBytes(shs.getBytes(hs_charset)); >> 90: for (int i = 0; i < 200000; i++) { > > This wont be a deterministic way to reach the intrinsic. I would suggest copying the idea from test/jdk/com/sun/crypto/provider/Cipher/ChaCha20/unittest/Poly1305UnitTestDriver.java > > i.e. Have two `@run main` invocations at the top of this file, one with default parameters, one with `-Xcomp -XX:-TieredCompilation`. You dont need a 'driver' program, that was to handle something else. > > > /* > * @test > * @modules java.base/com.sun.crypto.provider > * @run main java.base/com.sun.crypto.provider.Poly1305KAT > * @summary Unit test for com.sun.crypto.provider.Poly1305. > */ > > /* > * @test > * @modules java.base/com.sun.crypto.provider > * @summary Unit test for IntrinsicCandidate in com.sun.crypto.provider.Poly1305. > * @run main/othervm -Xcomp -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:+ForceUnreachable java.base/com.sun.crypto.provider.Poly1305KAT > */ Done. > test/jdk/java/lang/StringBuffer/IndexOf.java line 126: > >> 124: int aNewLength = getRandomIndex(min, max); >> 125: for (int y = 0; y < aNewLength; y++) { >> 126: int achar = generator.nextInt(30) + 30; > > This will only ever generate LL cases, i.e. chars from [30,60]. Could be parametrized to also produce utf16 if instead of 30, offset was in the unicode range Original code. > test/jdk/java/lang/StringBuffer/IndexOf.java line 199: > >> 197: System.out.println("Source="+sourceString.substring(hsBegin, hsBegin + haystackLen)); >> 198: System.out.println("Target="+targetString.substring(nBegin, nBegin + needleLen)); >> 199: System.out.println("haystackLen="+haystackLen+" neeldeLen="+needleLen+" hsBegin="+hsBegin+" nBegin="+nBegin+ > > This looks like 'development scaffolding' (i.e. printf debugging) that was meant to be removed This is additional information printed upon failure instead of just saying "failed" > test/jdk/java/lang/StringBuffer/IndexOf.java line 295: > >> 293: sourceString = generateTestString(99, 100); >> 294: sourceBuffer = new StringBuffer(sourceString); >> 295: targetString = generateTestString(10, 11); > > Generate a random int [0,1,2] for LL, UU, UL, pass that as parameter to generateTestString() to test the other paths. Same for other tests in this file using this pattern. > > This test is specific to haystacklen=100, needlelen=10.. what about other haystack/needle sizes to exercise all the paths in the intrinsic assembler (i.e. haystack >=, <=32, needlelen ={1,2,3,4,5..32..}). Elsewhere already? Original code. > test/jdk/java/lang/StringBuffer/IndexOf.java line 360: > >> 358: System.err.println(" sAnswer = " + sAnswer + ", sbAnswer = " + sbAnswer); >> 359: System.err.println(" testString = '" + testString + "'"); >> 360: System.err.println(" testBuffer = '" + testBuffer + "'"); > > tracing left here and further down Adding more information on failure. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613915508 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613919180 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613920449 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613922554 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613923075 From kvn at openjdk.org Fri May 24 20:15:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 24 May 2024 20:15:13 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v38] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 19:55:40 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Test clarifications test/jdk/java/lang/StringBuffer/IndexOf.java line 28: > 26: * @summary Test indexOf and lastIndexOf > 27: * @run main/othervm IndexOf > 28: * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions -Xcomp -XX:-TieredCompilation -XX:UseAVX=2 -XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts IndexOf I suggest to split it into 2 subtest jobs and use `@requires vm.cpu.features ~= ".*avx2.*"` for second which specified `-XX:UseAVX=2`. See `compiler/loopopts/superword/TestDependencyOffsets.java` for example. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613972734 From sgibbons at openjdk.org Fri May 24 20:26:40 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 20:26:40 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v39] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Split into two subtest jobs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/485d02fd..69ca8d13 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=38 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=37-38 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From duke at openjdk.org Fri May 24 20:26:40 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 24 May 2024 20:26:40 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v37] In-Reply-To: <4xYUBsOJ_eDSuj6w9AjUo_6gFN_9piWR-ChLrHQoXl4=.88756684-8e9c-48e3-8b59-f5f684b81cde@github.com> References: <4xYUBsOJ_eDSuj6w9AjUo_6gFN_9piWR-ChLrHQoXl4=.88756684-8e9c-48e3-8b59-f5f684b81cde@github.com> Message-ID: On Fri, 24 May 2024 15:32:26 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > mov64 => lea(InternalAddress) src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4633: > 4631: andl(result, 0x0000000f); // tail count (in bytes) > 4632: andl(limit, 0xfffffff0); // vector count (in bytes) > 4633: jcc(Assembler::zero, COMPARE_TAIL); In the `expand_ary2` case, this is the same andl/compare as line 4549; i.e. I think you can just put `jcc(Assembler::zero, COMPARE_TAIL);` on line 4549, inside the if (and move the other jcc into the else branch)? src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4639: > 4637: negptr(limit); > 4638: > 4639: bind(COMPARE_WIDE_VECTORS_16); Understanding-check.. this loop will execute at most 2 times, right? i.e. process as many 32-byte chunks as possible, then 1-or-2 16-byte chunks then byte-by-byte? (Still a good optimization, just trying to understand the scope) src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4718: > 4716: jmp(TRUE_LABEL); > 4717: } else { > 4718: movl(chr, Address(ary1, limit, scaleFactor)); scaleFactor is always Address::times_1 here (expand_ary2==false), might be clearer to change it back test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 57: > 55: > 56: generator = new Random(); > 57: long seed = generator.nextLong();//-5291521104060046276L; dead code test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 63: > 61: /////////////////////////// WARM-UP ////////////////////////// > 62: > 63: for (int i = 0; i < 20000; i++) { -Xcomp should be more deterministic (and quicker) way to reach the intrinsic (i.e. like the other tests) On other hand, perhaps this doesn't matter? @vnkozlov Understanding-check please.. these tests will run as part of every build from this point-till-infinity; Therefore, long test will affect every openjdk developer. But if this test is not run on every build, then the build-time does not matter, then this test can run for as long as it 'wants'. test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 160: > 158: } > 159: > 160: private static String generateTestString(int min, int max) { I see you have various `Charset[] charSets` above, but this function still only generates LL. Are those separate tests? Or am I missing some concatenation somewhere that will convert the generated string string to the correct encoding? You could had implemented my suggestion from IndexOf.generateTestString here instead, so that the tests that do call this function endup with multiple encodings; i.e. similar to what you already do in the next function. I suppose, with addition of String/IndexOf.java that is a moot point. test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 185: > 183: } > 184: > 185: private static int indexOfKernel(String haystack, String needle) { Is the intention of kernels not to be inlined so that it would be part of separate compilation? If so, you probably want to annotate it with `@CompilerControl(CompilerControl.Mode.DONT_INLINE)` i.e. https://github.com/openjdk/jmh/blob/master/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_16_CompilerControl.java test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 539: > 537: failCount = indexOfKernel("", ""); > 538: > 539: for (int x = 0; x < 1000000; x++) { Should we be concerned about the increased run-time? Or does this execute 'quickly enough' ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613940896 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613943518 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613946470 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613955620 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613955354 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613970971 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613967681 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613983597 From sgibbons at openjdk.org Fri May 24 20:26:41 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 20:26:41 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v38] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 20:12:07 GMT, Vladimir Kozlov wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Test clarifications > > test/jdk/java/lang/StringBuffer/IndexOf.java line 28: > >> 26: * @summary Test indexOf and lastIndexOf >> 27: * @run main/othervm IndexOf >> 28: * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions -Xcomp -XX:-TieredCompilation -XX:UseAVX=2 -XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts IndexOf > > I suggest to split it into 2 subtest jobs and use `@requires vm.cpu.features ~= ".*avx2.*"` for second which specified `-XX:UseAVX=2`. > See `compiler/loopopts/superword/TestDependencyOffsets.java` for example. Right. Done. Also added `@requires vm.compiler2.enabled` since my stub is only valid with C2. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613985672 From duke at openjdk.org Fri May 24 20:26:41 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 24 May 2024 20:26:41 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 14:50:40 GMT, Scott Gibbons wrote: >> test/jdk/java/lang/StringBuffer/IndexOf.java line 284: >> >>> 282: >>> 283: // Note: it is possible although highly improbable that failCount will >>> 284: // be > 0 even if everthing is working ok >> >> This sounds like either a bug or a testcase bug? Same as line 301, `extremely remote possibility of > 1 match`? > > This was there from the original author. I think they were trying to infer that a match could occur in the rare case that the same random string was produced. They're random after all, and there's no reason the same sequence could be generated. Makes sense ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613872215 From sgibbons at openjdk.org Fri May 24 20:47:23 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 20:47:23 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v40] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Review comments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/69ca8d13..be001e2c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=39 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=38-39 Stats: 13 lines in 2 files changed: 10 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sgibbons at openjdk.org Fri May 24 20:47:24 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 20:47:24 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v37] In-Reply-To: References: <4xYUBsOJ_eDSuj6w9AjUo_6gFN_9piWR-ChLrHQoXl4=.88756684-8e9c-48e3-8b59-f5f684b81cde@github.com> Message-ID: On Fri, 24 May 2024 19:30:54 GMT, Volodymyr Paprotski wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> mov64 => lea(InternalAddress) > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4633: > >> 4631: andl(result, 0x0000000f); // tail count (in bytes) >> 4632: andl(limit, 0xfffffff0); // vector count (in bytes) >> 4633: jcc(Assembler::zero, COMPARE_TAIL); > > In the `expand_ary2` case, this is the same andl/compare as line 4549; i.e. I think you can just put `jcc(Assembler::zero, COMPARE_TAIL);` on line 4549, inside the if (and move the other jcc into the else branch)? OK. Shortens pathlength by 4 instructions. > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4639: > >> 4637: negptr(limit); >> 4638: >> 4639: bind(COMPARE_WIDE_VECTORS_16); > > Understanding-check.. this loop will execute at most 2 times, right? > > i.e. process as many 32-byte chunks as possible, then 1-or-2 16-byte chunks then byte-by-byte? > > (Still a good optimization, just trying to understand the scope) Yes. > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4718: > >> 4716: jmp(TRUE_LABEL); >> 4717: } else { >> 4718: movl(chr, Address(ary1, limit, scaleFactor)); > > scaleFactor is always Address::times_1 here (expand_ary2==false), might be clearer to change it back *Sigh*. Changing it back. > test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 57: > >> 55: >> 56: generator = new Random(); >> 57: long seed = generator.nextLong();//-5291521104060046276L; > > dead code Fixed > test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 63: > >> 61: /////////////////////////// WARM-UP ////////////////////////// >> 62: >> 63: for (int i = 0; i < 20000; i++) { > > -Xcomp should be more deterministic (and quicker) way to reach the intrinsic (i.e. like the other tests) > > On other hand, perhaps this doesn't matter? @vnkozlov Understanding-check please.. these tests will run as part of every build from this point-till-infinity; Therefore, long test will affect every openjdk developer. But if this test is not run on every build, then the build-time does not matter, then this test can run for as long as it 'wants'. This test runs in well under 2 minutes. I'm not sure what is trying to be accomplished? > test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 160: > >> 158: } >> 159: >> 160: private static String generateTestString(int min, int max) { > > I see you have various `Charset[] charSets` above, but this function still only generates LL. Are those separate tests? Or am I missing some concatenation somewhere that will convert the generated string string to the correct encoding? > > You could had implemented my suggestion from IndexOf.generateTestString here instead, so that the tests that do call this function endup with multiple encodings; i.e. similar to what you already do in the next function. > > I suppose, with addition of String/IndexOf.java that is a moot point. Yes, I think it's a moot point. Thanks. > test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 185: > >> 183: } >> 184: >> 185: private static int indexOfKernel(String haystack, String needle) { > > Is the intention of kernels not to be inlined so that it would be part of separate compilation? > > If so, you probably want to annotate it with `@CompilerControl(CompilerControl.Mode.DONT_INLINE)` > > i.e. https://github.com/openjdk/jmh/blob/master/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_16_CompilerControl.java Fixed. > test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 539: > >> 537: failCount = indexOfKernel("", ""); >> 538: >> 539: for (int x = 0; x < 1000000; x++) { > > Should we be concerned about the increased run-time? Or does this execute 'quickly enough' Runs in well under 2 minutes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613997645 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613993657 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613998432 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614000081 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614000885 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614001480 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614002801 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614003072 From sviswanathan at openjdk.org Fri May 24 22:33:16 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 24 May 2024 22:33:16 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v35] In-Reply-To: <-vyOZzeMslZqgJpTsQnnOWi4abWiM8fNeWSVx5LEHm8=.d37011ee-102c-4874-aa26-d113949d25ea@github.com> References: <-vyOZzeMslZqgJpTsQnnOWi4abWiM8fNeWSVx5LEHm8=.d37011ee-102c-4874-aa26-d113949d25ea@github.com> Message-ID: On Thu, 23 May 2024 23:12:42 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Review comments - move stubGen*_string.cpp to c2_stubGen*_string.cpp src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1754: > 1752: continue; > 1753: } else { > 1754: Label L_loopTop; L_loopTop label not used in the else block. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612495013 From sviswanathan at openjdk.org Fri May 24 22:33:15 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 24 May 2024 22:33:15 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v40] In-Reply-To: References: Message-ID: <6r30gPhGsZAoAOSYsP39qr2czQ8Wj7YMOxlP2VZZpAI=.61ee3985-d3a5-40b2-9bce-453253185600@github.com> On Fri, 24 May 2024 20:47:23 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Review comments. src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1122: > 1120: // eq_mask - The bit mask returned that holds the result of the comparison > 1121: // rTmp - a temporary register > 1122: // rTmp2 - a temporary register There is no rtmp, rtmp2 here. src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1129: > 1127: // _masm - Current MacroAssembler instance pointer > 1128: // > 1129: // If (n - k) < 32, need to handle reading past end of haystack Don't see (n-k) < 32 being handled in this function. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614091336 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614092828 From sviswanathan at openjdk.org Fri May 24 22:33:13 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 24 May 2024 22:33:13 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v20] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 23:47:45 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Addressing lots of comments. Interim commit. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4737: > 4735: bind(COMPARE_BYTE); > 4736: } else { > 4737: lea(ary1, Address(ary1, expand_ary2 ? 4 : 2)); This change is not required. expand_ary2 code doesn't come here. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1233: > 1231: __ andq(eq_mask, rTmp); > 1232: > 1233: __ testl(eq_mask, eq_mask); Mismatch of operation size q vs l: andq and testl. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1623: > 1621: //////////////////////////////////////////////////////////////////////////////////////// > 1622: // > 1623: // Small haystack (<32 bytes) switch This should be <= 32 bytes. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1709: > 1707: // XMM_BYTE_K - last element of needle, broadcast > 1708: // > 1709: // The haystack is >= 32 bytes Should this be > 32 bytes? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1609023624 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1609043720 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1609160143 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1609163535 From sviswanathan at openjdk.org Fri May 24 22:33:17 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 24 May 2024 22:33:17 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v25] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 17:40:24 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > un-helper-ize preload_needle_helper; try fix for macos build src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 101: > 99: if (isU) { \ > 100: __ blsrl(tmp, mask); \ > 101: __ blsrl(mask, tmp); \ We could do this as __ blsrl(mask, mask); \ __ blsrl(mask, mask); \ Thereby removing the need for tmp. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610453473 From sviswanathan at openjdk.org Fri May 24 22:33:19 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 24 May 2024 22:33:19 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v27] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 18:52:27 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Revert last change to IndexOf.java src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1218: > 1216: // isU - true if argument encoding is either UU or UL > 1217: // eq_mask - The bit mask returned that holds the result of the comparison > 1218: // needleLen - a temporary register. Only used if isUL true needleLen is not a temporary register. needleLen is used to read the kThByte from haystack below when !sizeKnown so must hold valid info. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1439: > 1437: // back to last valid read position > 1438: __ cmpq(hsPtrRet, last); > 1439: __ jb_b(L_midLoop); could be jbe_b? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610617943 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610740998 From sgibbons at openjdk.org Fri May 24 23:11:13 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 23:11:13 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v20] In-Reply-To: References: Message-ID: <3A1V-APGmN8EO49abMKEzdGA-VLYsIiKtTrJCPtuYUc=.a6c13f99-c314-4872-a347-02e6c8a6b8aa@github.com> On Tue, 21 May 2024 22:39:42 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressing lots of comments. Interim commit. > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4737: > >> 4735: bind(COMPARE_BYTE); >> 4736: } else { >> 4737: lea(ary1, Address(ary1, expand_ary2 ? 4 : 2)); > > This change is not required. expand_ary2 code doesn't come here. Right. Fixed. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1233: > >> 1231: __ andq(eq_mask, rTmp); >> 1232: >> 1233: __ testl(eq_mask, eq_mask); > > Mismatch of operation size q vs l: andq and testl. Fixed. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1623: > >> 1621: //////////////////////////////////////////////////////////////////////////////////////// >> 1622: // >> 1623: // Small haystack (<32 bytes) switch > > This should be <= 32 bytes. Fixed. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1709: > >> 1707: // XMM_BYTE_K - last element of needle, broadcast >> 1708: // >> 1709: // The haystack is >= 32 bytes > > Should this be > 32 bytes? Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614114763 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614127986 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614127889 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614127781 From sgibbons at openjdk.org Fri May 24 23:11:15 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 23:11:15 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v40] In-Reply-To: <6r30gPhGsZAoAOSYsP39qr2czQ8Wj7YMOxlP2VZZpAI=.61ee3985-d3a5-40b2-9bce-453253185600@github.com> References: <6r30gPhGsZAoAOSYsP39qr2czQ8Wj7YMOxlP2VZZpAI=.61ee3985-d3a5-40b2-9bce-453253185600@github.com> Message-ID: <38c22L3m_I_joyXB6ZAzaAaec3-Gj4spqor35Pv1h6c=.31c1e620-dd74-49d9-8c8b-a4864167a6cc@github.com> On Fri, 24 May 2024 22:26:56 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments. > > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1122: > >> 1120: // eq_mask - The bit mask returned that holds the result of the comparison >> 1121: // rTmp - a temporary register >> 1122: // rTmp2 - a temporary register > > There is no rtmp, rtmp2 here. Fixed. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1129: > >> 1127: // _masm - Current MacroAssembler instance pointer >> 1128: // >> 1129: // If (n - k) < 32, need to handle reading past end of haystack > > Don't see (n-k) < 32 being handled in this function. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614116033 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614116814 From sgibbons at openjdk.org Fri May 24 23:11:16 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 23:11:16 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v35] In-Reply-To: References: <-vyOZzeMslZqgJpTsQnnOWi4abWiM8fNeWSVx5LEHm8=.d37011ee-102c-4874-aa26-d113949d25ea@github.com> Message-ID: On Fri, 24 May 2024 00:09:38 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments - move stubGen*_string.cpp to c2_stubGen*_string.cpp > > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1754: > >> 1752: continue; >> 1753: } else { >> 1754: Label L_loopTop; > > L_loopTop label not used in the else block. Removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614117294 From sgibbons at openjdk.org Fri May 24 23:11:17 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 23:11:17 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v25] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 18:22:24 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> un-helper-ize preload_needle_helper; try fix for macos build > > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 101: > >> 99: if (isU) { \ >> 100: __ blsrl(tmp, mask); \ >> 101: __ blsrl(mask, tmp); \ > > We could do this as > __ blsrl(mask, mask); \ > __ blsrl(mask, mask); \ > Thereby removing the need for tmp. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614127638 From sgibbons at openjdk.org Fri May 24 23:11:19 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 23:11:19 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v27] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 20:36:25 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert last change to IndexOf.java > > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1218: > >> 1216: // isU - true if argument encoding is either UU or UL >> 1217: // eq_mask - The bit mask returned that holds the result of the comparison >> 1218: // needleLen - a temporary register. Only used if isUL true > > needleLen is not a temporary register. needleLen is used to read the kThByte from haystack below when !sizeKnown so must hold valid info. Fixed. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1439: > >> 1437: // back to last valid read position >> 1438: __ cmpq(hsPtrRet, last); >> 1439: __ jb_b(L_midLoop); > > could be jbe_b? Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614127526 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614127356 From sgibbons at openjdk.org Fri May 24 23:15:26 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 23:15:26 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v41] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Fix test; review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/be001e2c..b154faee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=40 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=39-40 Stats: 31 lines in 3 files changed: 4 ins; 13 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sviswanathan at openjdk.org Sat May 25 00:46:19 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Sat, 25 May 2024 00:46:19 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v41] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 23:15:26 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Fix test; review comments src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 316: > 314: // Set up jump tables. Used when needle size <= NUMBER_OF_CASES > 315: setup_jump_tables(ae, L_returnRBP, L_checkRangeAndReturn, L_bigCaseFixupAndReturn, > 316: &big_jump_table, &small_jump_table, _masm); We could directly use L_returnError here instead of L_returnRBP. src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 476: > 474: // Used to check and return value in rbp - usually error > 475: __ bind(L_returnRBP); > 476: __ movq(rax, rbp); This seems spurious as rax is being overwritten at line 489. Did you intend to return -1? src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1816: > 1814: byte_compare_helper(i + 1, L_loopTop, L_fixup, needle, needle_val, hs_ptr, eq_mask, set_bit, > 1815: rTmp4, ae, _masm); > 1816: } L_checkRange on NoMatch could be L_error instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614172379 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614172021 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614175081 From sviswanathan at openjdk.org Sat May 25 00:46:19 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Sat, 25 May 2024 00:46:19 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v35] In-Reply-To: <-vyOZzeMslZqgJpTsQnnOWi4abWiM8fNeWSVx5LEHm8=.d37011ee-102c-4874-aa26-d113949d25ea@github.com> References: <-vyOZzeMslZqgJpTsQnnOWi4abWiM8fNeWSVx5LEHm8=.d37011ee-102c-4874-aa26-d113949d25ea@github.com> Message-ID: On Thu, 23 May 2024 23:12:42 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Review comments - move stubGen*_string.cpp to c2_stubGen*_string.cpp src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1740: > 1738: // > 1739: // If a match is found, jump to L_checkRangeAndReturn, which ensures the > 1740: // matched needle is not past the end of the haystack. These labels are not in this function. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614125339 From alanb at openjdk.org Sat May 25 06:36:12 2024 From: alanb at openjdk.org (Alan Bateman) Date: Sat, 25 May 2024 06:36:12 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v41] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 23:15:26 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Fix test; review comments test/jdk/java/lang/StringBuffer/IndexOf.java line 47: > 45: public class IndexOf { > 46: > 47: static Random generator = new Random(); @RogerRiggs Would you have cycles to look at Scott's changes to this test? I suspect it will need to be re-structured, re-formatted, and commented to get into maintainable shape. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614383260 From sgibbons at openjdk.org Sat May 25 21:57:13 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Sat, 25 May 2024 21:57:13 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v41] In-Reply-To: References: Message-ID: On Sat, 25 May 2024 00:15:03 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix test; review comments > > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 316: > >> 314: // Set up jump tables. Used when needle size <= NUMBER_OF_CASES >> 315: setup_jump_tables(ae, L_returnRBP, L_checkRangeAndReturn, L_bigCaseFixupAndReturn, >> 316: &big_jump_table, &small_jump_table, _masm); > > We could directly use L_returnError here instead of L_returnRBP. OK > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 476: > >> 474: // Used to check and return value in rbp - usually error >> 475: __ bind(L_returnRBP); >> 476: __ movq(rax, rbp); > > This seems spurious as rax is being overwritten at line 489. Did you intend to return -1? Removed all references to L_returnRBP. Replaced with L_returnError. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1816: > >> 1814: byte_compare_helper(i + 1, L_loopTop, L_fixup, needle, needle_val, hs_ptr, eq_mask, set_bit, >> 1815: rTmp4, ae, _masm); >> 1816: } > > L_checkRange on NoMatch could be L_error instead. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614900796 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614903860 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614901577 From sgibbons at openjdk.org Sat May 25 21:57:14 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Sat, 25 May 2024 21:57:14 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v35] In-Reply-To: References: <-vyOZzeMslZqgJpTsQnnOWi4abWiM8fNeWSVx5LEHm8=.d37011ee-102c-4874-aa26-d113949d25ea@github.com> Message-ID: On Fri, 24 May 2024 23:04:55 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments - move stubGen*_string.cpp to c2_stubGen*_string.cpp > > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1740: > >> 1738: // >> 1739: // If a match is found, jump to L_checkRangeAndReturn, which ensures the >> 1740: // matched needle is not past the end of the haystack. > > These labels are not in this function. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614901350 From sgibbons at openjdk.org Sat May 25 22:16:41 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Sat, 25 May 2024 22:16:41 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v42] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Review comments; fix reading past end of haystack when (n-k) < 32 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/b154faee..e13c7ea4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=41 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=40-41 Stats: 78 lines in 1 file changed: 29 ins; 9 del; 40 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sgibbons at openjdk.org Sat May 25 22:19:41 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Sat, 25 May 2024 22:19:41 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v43] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Fix tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/e13c7ea4..15994a39 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=42 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=41-42 Stats: 2 lines in 2 files changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From djelinski at openjdk.org Sun May 26 06:21:15 2024 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Sun, 26 May 2024 06:21:15 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v33] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 18:37:13 GMT, Vladimir Kozlov wrote: >> Changed to `lea` with `InternalAddress()`. Generates the exact same code, but makes more sense. I looked at `movdqu` and see no code that generates RIP-relative loads. It merely checks `reachable()` and adds an intermediate `lea` if not reachable. @djelinski can you clarify please? > > I think HotSpot prefer to have full addresses in `lea` for possible patching. Right. Our assembler implements rip-relative addressing for some instructions, but apparently lea isn't one of them. I'll experiment with it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1615033502 From duke at openjdk.org Sun May 26 15:44:07 2024 From: duke at openjdk.org (duke) Date: Sun, 26 May 2024 15:44:07 GMT Subject: Withdrawn: 8325674: Constant fold across compares In-Reply-To: References: Message-ID: On Wed, 14 Feb 2024 19:35:34 GMT, Joshua Cao wrote: > For example, `x + 1 < 2` -> `x < 2 - 1` iff we can prove that `x + 1` does not overflow and `2 - 1` does not overflow. We can always fold if it is an `==` or `!=` since overflow will not affect the result of the comparison. > > Consider this more practical example: > > > public void foo(int[] arr) { > for (i = arr.length - 1; i >= 0; --i) { > blackhole(arr[i]); > } > } > > > C2 emits a loop guard that looks `arr.length - 1 < 0`. We know `arr.length - 1` does not overflow because `arr.length` is positive. We can fold the comparison into `arr.length < 1`. We have to compute `arr.length - 1` computation if we enter the loop anyway, but we can avoid the subtraction computation if we never enter the loop. I believe the simplification can also help with stronger integer range analysis in https://bugs.openjdk.org/browse/JDK-8275202. > > Some additional notes: > * there is various overflow checking code across `src/hotspot/share/opto`. I separated about the functions from convertnode.cpp into `type.hpp`. Maybe the functions belong somewhere else? > * there is a change in Parse::do_if() to repeatedly apply GVN until the test is canonical. We need multiple iterations in the case of `C1 > C2 - X` -> `C2 - X < C1` -> `C2 < X` -> `X > C2`. This fails the assertion if `BoolTest(btest).is_canonical()`. We can avoid this by applying GVN one more time to get `C2 < X`. > * we should not transform loop backedge conditions. For example, if we have `for (i = 0; i < 10; ++i) {}`, the backedge condition is `i + 1 < 10`. If we transform it into `i < 9`, it messes with CountedLoop's recognition of induction variables and strides.r > * this change optimizes some of the equality checks in `TestUnsignedComparison.java` and breaks the IR checks. I removed those tests. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/17853 From jbhateja at openjdk.org Mon May 27 06:11:24 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 27 May 2024 06:11:24 GMT Subject: RFR: 8325083: jdk/incubator/vector/Double512VectorTests.java crashes in Assembler::vex_prefix_and_encode Message-ID: This bugfix patch limits the register class for operands of byte to double cast pattern to prevent reported assertion failure on Knights family CPUs. Kindly review and share your feedback. Best Regards, Jatin ------------- Commit messages: - 8325083: jdk/incubator/vector/Double512VectorTests.java crashes in Assembler::vex_prefix_and_encode Changes: https://git.openjdk.org/jdk/pull/19407/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19407&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8325083 Stats: 14 lines in 1 file changed: 13 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19407.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19407/head:pull/19407 PR: https://git.openjdk.org/jdk/pull/19407 From thartmann at openjdk.org Mon May 27 08:24:23 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 27 May 2024 08:24:23 GMT Subject: RFR: 8332956: Problem list CodeCacheFullCountTest.java until JDK-8332954 is fixed Message-ID: The tests has multiple issues (see duplicate links from [JDK-8332954](https://bugs.openjdk.org/browse/JDK-8332954)) and should be problem listed for now. Thanks, Tobias ------------- Commit messages: - 8332956: Problem list CodeCacheFullCountTest.java until JDK-8332954 is fixed Changes: https://git.openjdk.org/jdk/pull/19408/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19408&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332956 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19408.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19408/head:pull/19408 PR: https://git.openjdk.org/jdk/pull/19408 From chagedorn at openjdk.org Mon May 27 08:39:03 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 27 May 2024 08:39:03 GMT Subject: RFR: 8332956: Problem list CodeCacheFullCountTest.java until JDK-8332954 is fixed In-Reply-To: References: Message-ID: On Mon, 27 May 2024 07:34:03 GMT, Tobias Hartmann wrote: > The tests has multiple issues (see duplicate links from [JDK-8332954](https://bugs.openjdk.org/browse/JDK-8332954)) and should be problem listed for now. > > Thanks, > Tobias Looks good and trivial! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19408#pullrequestreview-2080362257 From luhenry at openjdk.org Mon May 27 08:41:25 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 27 May 2024 08:41:25 GMT Subject: RFR: 8332402: [IR Framework] Add tests for applyIfCPUFeature* and applyIfPlatform* in TestBadFormat In-Reply-To: References: Message-ID: On Mon, 20 May 2024 08:27:33 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch to add some test for IR test framework? > As discussed https://github.com/openjdk/jdk/pull/19270#pullrequestreview-2060974799, it's worth to add some tests for for applyIfCPUFeature* and applyIfPlatform* in TestBadFormat. > > Thanks Marked as reviewed by luhenry (Committer). @chhagedorn @eme64 could you please review as you did the most recent changes on this file? Not sure who else to ping on this. Thank you! ------------- PR Review: https://git.openjdk.org/jdk/pull/19302#pullrequestreview-2067817156 PR Comment: https://git.openjdk.org/jdk/pull/19302#issuecomment-2121993845 From mli at openjdk.org Mon May 27 08:41:25 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 27 May 2024 08:41:25 GMT Subject: RFR: 8332402: [IR Framework] Add tests for applyIfCPUFeature* and applyIfPlatform* in TestBadFormat Message-ID: Hi, Can you help to review this patch to add some test for IR test framework? As discussed https://github.com/openjdk/jdk/pull/19270#pullrequestreview-2060974799, it's worth to add some tests for for applyIfCPUFeature* and applyIfPlatform* in TestBadFormat. Thanks ------------- Commit messages: - Initial commit - format Changes: https://git.openjdk.org/jdk/pull/19302/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19302&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332402 Stats: 321 lines in 1 file changed: 318 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19302/head:pull/19302 PR: https://git.openjdk.org/jdk/pull/19302 From epeter at openjdk.org Mon May 27 08:41:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 27 May 2024 08:41:25 GMT Subject: RFR: 8332402: [IR Framework] Add tests for applyIfCPUFeature* and applyIfPlatform* in TestBadFormat In-Reply-To: References: Message-ID: On Mon, 20 May 2024 08:27:33 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch to add some test for IR test framework? > As discussed https://github.com/openjdk/jdk/pull/19270#pullrequestreview-2060974799, it's worth to add some tests for for applyIfCPUFeature* and applyIfPlatform* in TestBadFormat. > > Thanks Thanks for the work! Generally looks good. I have a few comments. We should also wait a week, @chhagedorn is out of the office. He generally has more context on the IR framework. @Hamlin-Li for https://github.com/openjdk/jdk/pull/19270 > @Hamlin-Li you only got 1 review. Per the rules, you generally need 2: > https://openjdk.org/guide/#final-check-before-creating-the-pr > > That is unless you say that the change is trivial, and the reviewer also confirms that it is trivial. I don't see that here. > > Our rule is that you need 2 reviewers: at least one reviewer, the second one can be a committer. test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestBadFormat.java line 849: > 847: @Test > 848: @IR(failOn = IRNode.CALL, applyIf = {"TLABRefillWasteFraction", "50"}, applyIfNot = {"UseTLAB", "true"}) > 849: @IR(failOn = IRNode.CALL, applyIfAnd = {"TLABRefillWasteFraction", "50", "UseTLAB", "true"}, Not sure why you changed the format here. You seem to have one-liners everywhere else? Plus it just creates noise when reviewing. test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestBadFormat.java line 861: > 859: public void onlyOneApplyIfCPUFeature() {} > 860: > 861: @FailCount(3) Why are there 3 failures? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19302#pullrequestreview-2067876106 PR Comment: https://git.openjdk.org/jdk/pull/19302#issuecomment-2122018971 PR Review Comment: https://git.openjdk.org/jdk/pull/19302#discussion_r1607858537 PR Review Comment: https://git.openjdk.org/jdk/pull/19302#discussion_r1607852163 From chagedorn at openjdk.org Mon May 27 08:41:25 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 27 May 2024 08:41:25 GMT Subject: RFR: 8332402: [IR Framework] Add tests for applyIfCPUFeature* and applyIfPlatform* in TestBadFormat In-Reply-To: References: Message-ID: On Mon, 20 May 2024 08:27:33 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch to add some test for IR test framework? > As discussed https://github.com/openjdk/jdk/pull/19270#pullrequestreview-2060974799, it's worth to add some tests for for applyIfCPUFeature* and applyIfPlatform* in TestBadFormat. > > Thanks Thanks for adding all these new tests! I have some comments. test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestBadFormat.java line 865: > 863: @IR(failOn = IRNode.CALL, applyIfCPUFeature = {"sve", "true", "avx", "true"}) > 864: @IR(failOn = IRNode.CALL, applyIfCPUFeature = {"sve", "true", "avx"}) > 865: public void applyIfCPUFeatureTooManyFlags() {} Suggestion: public void applyIfCPUFeatureTooManyCPUFeatures() {} I suggest to replace `flag(s)` in all the new tests by `CPUFeature(s)` test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestBadFormat.java line 867: > 865: public void applyIfCPUFeatureTooManyFlags() {} > 866: > 867: @FailCount(4) Why is this different from `applyIfMissingValue()`? Also, there are some other tests where the `FailCount` value is different from the corresponding flag-test version. ------------- PR Review: https://git.openjdk.org/jdk/pull/19302#pullrequestreview-2080225426 PR Review Comment: https://git.openjdk.org/jdk/pull/19302#discussion_r1615620913 PR Review Comment: https://git.openjdk.org/jdk/pull/19302#discussion_r1615638289 From epeter at openjdk.org Mon May 27 08:41:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 27 May 2024 08:41:25 GMT Subject: RFR: 8332402: [IR Framework] Add tests for applyIfCPUFeature* and applyIfPlatform* in TestBadFormat In-Reply-To: References: Message-ID: On Tue, 21 May 2024 08:12:17 GMT, Emanuel Peter wrote: >> Hi, >> Can you help to review this patch to add some test for IR test framework? >> As discussed https://github.com/openjdk/jdk/pull/19270#pullrequestreview-2060974799, it's worth to add some tests for for applyIfCPUFeature* and applyIfPlatform* in TestBadFormat. >> >> Thanks > > test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestBadFormat.java line 861: > >> 859: public void onlyOneApplyIfCPUFeature() {} >> 860: >> 861: @FailCount(3) > > Why are there 3 failures? Generally, the count is not always as I would expect. Can we fix that? You should at least comment what fails, and why. I would expect there to be a single failure per rule. But that seems not to be the case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19302#discussion_r1607854475 From chagedorn at openjdk.org Mon May 27 08:41:25 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 27 May 2024 08:41:25 GMT Subject: RFR: 8332402: [IR Framework] Add tests for applyIfCPUFeature* and applyIfPlatform* in TestBadFormat In-Reply-To: References: Message-ID: On Tue, 21 May 2024 08:13:39 GMT, Emanuel Peter wrote: > Generally, the count is not always as I would expect. Can we fix that? You should at least comment what fails, and why. I agree that this would be helpful when the count does not match the number of IR rules. You don't need to add it for existing tests where it is missing but at least it would be good for new tests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19302#discussion_r1615636721 From rcastanedalo at openjdk.org Mon May 27 08:45:06 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 27 May 2024 08:45:06 GMT Subject: Integrated: 8332527: ZGC: generalize object cloning logic In-Reply-To: References: Message-ID: On Mon, 20 May 2024 14:31:26 GMT, Roberto Casta?eda Lozano wrote: > This changeset generalize the logic to produce a runtime call to clone a class instance so that it can be shared by other collectors adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). The changeset moves the logic from `ZBarrierSetC2` to the GC-shared `BarrierSetC2` class and adds support for 32-bits platforms. > > #### Testing > > - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). > - tier4-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only). > - `compiler/arraycopy` tests (linux-x86-debug) with [an additional patch](https://github.com/openjdk/jdk/commit/ddcf777894e740b8e6ddbbf8821e82a173c23ef4) that implements cloning of large class instances with a runtime clone call rather than arraycopy when using G1 (to exercise the generalized logic on a 32-bits platform). This pull request has now been integrated. Changeset: ffa4badb Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/ffa4badb78118d154e47e41073e467c0e0e4273c Stats: 94 lines in 3 files changed: 52 ins; 38 del; 4 mod 8332527: ZGC: generalize object cloning logic Reviewed-by: aboldtch, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/19311 From yzheng at openjdk.org Mon May 27 13:08:09 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 27 May 2024 13:08:09 GMT Subject: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v7] In-Reply-To: <9FFOmfnJsAIg1KJN0RcpDmAzpn68k4QBvFifeazLjmc=.dc821eea-3423-4a34-bcfc-217183169352@github.com> References: <9FFOmfnJsAIg1KJN0RcpDmAzpn68k4QBvFifeazLjmc=.dc821eea-3423-4a34-bcfc-217183169352@github.com> Message-ID: On Fri, 24 May 2024 15:12:28 GMT, Yudi Zheng wrote: >> Moving array construction within BigInteger.implMultiplyToLen intrinsic candidate to its caller simplifies the intrinsic implementation in JIT compiler. > > Yudi Zheng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into JDK-8327964 > - address comments. > - address comments. > - address comment. > - address comment. > - address comment. > - address comment. > - Simplify BigInteger.implMultiplyToLen intrinsic Thanks for the reviews! Mach5 testing looks good except for a couple known timeouts unrelated to this PR. GHA test failure is due to [JDK-8332923](https://bugs.openjdk.org/browse/JDK-8332923). ------------- PR Comment: https://git.openjdk.org/jdk/pull/18226#issuecomment-2133444527 From thartmann at openjdk.org Mon May 27 13:26:05 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 27 May 2024 13:26:05 GMT Subject: RFR: 8332956: Problem list CodeCacheFullCountTest.java until JDK-8332954 is fixed In-Reply-To: References: Message-ID: On Mon, 27 May 2024 07:34:03 GMT, Tobias Hartmann wrote: > The tests has multiple issues (see duplicate links from [JDK-8332954](https://bugs.openjdk.org/browse/JDK-8332954)) and should be problem listed for now. > > Thanks, > Tobias Thanks Christian! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19408#issuecomment-2133477105 From thartmann at openjdk.org Mon May 27 13:26:05 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 27 May 2024 13:26:05 GMT Subject: Integrated: 8332956: Problem list CodeCacheFullCountTest.java until JDK-8332954 is fixed In-Reply-To: References: Message-ID: On Mon, 27 May 2024 07:34:03 GMT, Tobias Hartmann wrote: > The tests has multiple issues (see duplicate links from [JDK-8332954](https://bugs.openjdk.org/browse/JDK-8332954)) and should be problem listed for now. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 793fd72f Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/793fd72fa66b1367b68fe798230ea61ea0aab1d8 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8332956: Problem list CodeCacheFullCountTest.java until JDK-8332954 is fixed Reviewed-by: chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/19408 From yzheng at openjdk.org Mon May 27 14:28:14 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 27 May 2024 14:28:14 GMT Subject: Integrated: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic In-Reply-To: References: Message-ID: On Tue, 12 Mar 2024 10:44:54 GMT, Yudi Zheng wrote: > Moving array construction within BigInteger.implMultiplyToLen intrinsic candidate to its caller simplifies the intrinsic implementation in JIT compiler. This pull request has now been integrated. Changeset: ed81a478 Author: Yudi Zheng Committer: Martin Doerr URL: https://git.openjdk.org/jdk/commit/ed81a478e175631f1de69eb4b43f927629fefd74 Stats: 146 lines in 17 files changed: 11 ins; 82 del; 53 mod 8327964: Simplify BigInteger.implMultiplyToLen intrinsic Reviewed-by: mdoerr, amitkumar, kvn, fyang ------------- PR: https://git.openjdk.org/jdk/pull/18226 From fyang at openjdk.org Mon May 27 14:30:10 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 27 May 2024 14:30:10 GMT Subject: RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v8] In-Reply-To: References: Message-ID: On Mon, 20 May 2024 22:36:30 GMT, ArsenyBochkarev wrote: >> Hello everyone! Please review this ~non-vectorized~ implementation of `_updateBytesAdler32` intrinsic. Reference implementation for AArch64 can be found [here](https://github.com/openjdk/jdk9/blob/master/hotspot/src/cpu/aarch64/vm/stubGenerator_aarch64.cpp#L3281). >> >> ### Correctness checks >> >> Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok. All tier1 also passed. >> >> ### Performance results on T-Head board >> >> Enabled intrinsic: >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- | >> | Adler32.TestAdler32.testAdler32Update | 64 | thrpt | 25 | 5522.693 | 23.387 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 128 | thrpt | 25 | 3430.761 | 9.210 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 256 | thrpt | 25 | 1962.888 | 5.323 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 512 | thrpt | 25 | 1050.938 | 0.144 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 1024 | thrpt | 25 | 549.227 | 0.375 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 2048 | thrpt | 25 | 280.829 | 0.170 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 5012 | thrpt | 25 | 116.333 | 0.057 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 8192 | thrpt | 25 | 71.392 | 0.060 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 16384 | thrpt | 25 | 35.784 | 0.019 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 32768 | thrpt | 25 | 17.924 | 0.010 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 65536 | thrpt | 25 | 8.940 | 0.003 | ops/ms | >> >> Disabled intrinsic: >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- | >> |Adler32.TestAdler32.testAdler32Update|64|thrpt|25|655.633|5.845|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|128|thrpt|25|587.418|10.062|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|256|thrpt|25|546.675|11.598|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|512|thrpt|25|432.328|11.517|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|1024|thrpt|25|311.771|4.238|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|2048|thrpt|25|202.648|2.486|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|5012|thrpt|... > > ArsenyBochkarev has updated the pull request incrementally with three additional commits since the last revision: > > - Partially unroll L_by16_loop > - Fix by64 function for vlen > 128 > - Fix by16 function for vlen > 128 Hi, I have some comments after a cursory look. Will take a more closer look once these are resolved. Thanks. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5069: > 5067: > 5068: // Load data > 5069: __ vsetvli(temp0, count, Assembler::e8, Assembler::m4); Maybe add a simple assertion about `count` before this to make sure that it equals 64 on entry? Or let this function initialize `count` to 64, which I guess won't impact performance much. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5083: > 5081: // Summing up calculated results for s2_new > 5082: __ vsetvli(temp0, count, Assembler::e16, Assembler::m4); > 5083: // 0xFF * 0x10 = 0xFF0 max per single vector element, I don't quite understand this line of code comment. What does `0x10` here stands for? src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5096: > 5094: // Extracting results for: > 5095: // s1_new > 5096: __ vmv_x_s(temp0, vs1acc[0]); Note that `vmv_x_s` will sign-extend the `e16` reduction result in `vs1acc[0]`. Is that safe? src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5207: > 5205: Register step = x28; // t3 > 5206: > 5207: VectorRegister vzero = v4; // group: v5, v6, v7 I see `vzero` is only used as the scalar source for vector integer reduction instructions, so it's not necessary for `vzero` to be a group of: v4, v5, v6, v7. Seems that we can assign the final v31 for `vzero` and thus free vector register group of v4, v5, v6, v7. And here is what the RVV spec says for reference: Vector reduction operations take a vector register group of elements and a scalar held in element 0 of a vector register, and perform a reduction using some binary operator, to produce a scalar result in element 0 of a vector register. The scalar input and output operands are held in element 0 of a single vector register, not a vector register group, so any vector register can be the scalar source or destination of a vector reduction regardless of LMUL setting. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5217: > 5215: v16, v18, v20, v22 > 5216: }; > 5217: VectorRegister vtable_64 = v24; // group: v25, v26, v27 Suggestion: `// group: v24, v25, v26, v27` src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5220: > 5218: VectorRegister vtable_16 = (MaxVectorSize == 16) ? v27 : v30; > 5219: VectorRegister vtemp1 = v28; // group: v29, v30, v31 > 5220: VectorRegister vtemp2 = v29; Similar for `vtemp1` and `vtemp2` which are only used as the scalar destination for vector integer reduction instructions: it's not necessary for them to be a vector register group. So you might want to remove the code comment for `vtemp1`. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5234: > 5232: __ vid_v(vtemp1); > 5233: __ vmv_v_x(vtable_64, temp1); > 5234: __ vsub_vv(vtable_64, vtable_64, vtemp1); I think a more simpler `vrsub_vx vtable_64, vtemp1, temp1` will do? This will help save the `vmv_v_x` instruction. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5245: > 5243: __ vid_v(vtemp1); > 5244: __ vmv_v_x(vtable_16, temp1); > 5245: __ vsub_vv(vtable_16, vtable_16, vtemp1); Similar here: `vrsub_vx vtable_16, vtemp1, temp1`. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5303: > 5301: const int remainder = 3; > 5302: adler32_process_bytes_by16(buff, s1, s2, right_16_bits, vtable_16, vzero, > 5303: vbytes, vs1acc, vs2acc, temp0, temp1, temp2, vtemp1, vtemp2, remainder); Maybe deserves another `adler32_process_bytes_by32` here? Then you do one `adler32_process_bytes_by32` and one `adler32_process_bytes_by16` for the rest 3 iterations. ------------- Changes requested by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18382#pullrequestreview-2079981842 PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1615962910 PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1616037433 PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1616020696 PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1615470296 PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1615469292 PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1616000231 PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1615474919 PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1615492806 PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1616119146 From mbaesken at openjdk.org Mon May 27 14:53:07 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 27 May 2024 14:53:07 GMT Subject: RFR: 8332904: ubsan ppc64le: c1_LIRGenerator_ppc.cpp:581:21: runtime error: signed integer overflow: 9223372036854775807 + 1 cannot be represented in type 'long int' Message-ID: <48iXq_MgcuQTyGGANYnJQlEJwBhmRsiX9SCtRNtQHbQ=.baf4713c-fbcf-4bb1-bb18-5af9b3bab57f@github.com> When using ubsan on Linux ppc64le we run into some overflows like this one c1_LIRGenerator_ppc.cpp:581:21: runtime error: signed integer overflow: 9223372036854775807 + 1 cannot be represented in type 'long int' Seems we have to add casts to get defined behavior. There are similar places in the coding as well. ------------- Commit messages: - JDK-8332904 Changes: https://git.openjdk.org/jdk/pull/19413/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19413&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332904 Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/19413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19413/head:pull/19413 PR: https://git.openjdk.org/jdk/pull/19413 From kvn at openjdk.org Mon May 27 15:26:07 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 27 May 2024 15:26:07 GMT Subject: RFR: 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode [v3] In-Reply-To: References: Message-ID: <_Dxr5PwFkWA0ATFboqifUQsVVLYclEJvV6tUgDzSOpQ=.3f4d7a42-57b8-43a1-83ca-388848346784@github.com> On Wed, 15 May 2024 08:47:19 GMT, Christian Hagedorn wrote: >> This patch replaces the `Opaque4Node` of the `If` for Initialized Assertion Predicates with a new `OpaqueInitializedAsseritonPredicateNode`. This helps to simplify pattern matching for predicate code and to distinguish from the two other uses of `Opaque4` nodes: >> 1. Template Assertion Predicate: The goal is to get rid of its `Opaque4Node` as well by using a dedicated `TemplateAssertionPredicateNode` for the `IfNode`. >> 2. Non-null-checks with instrinsics and unsafe accesses: This will eventually be the only use left. Once we get there, we should rename the node accordingly to `OpaqueNonNullCheck` or something like that. >> >> I went through all the uses of `Opaque4` nodes and did the following: >> - Could the `Opaque4` node be part of an Initialized Assertion Predicate? >> - No: Added an assert that we are not dealing with an Initialized Assertion Predicate. >> - Yes: >> - Yes **and only** for Initialized Assertion Predicates? Added an assert that we are only expecting an `OpaqueInitializedAsseritonPredicateNode` if appropriate. >> - Yes but could also be something else: Added case for `OpaqueInitializedAsseritonPredicateNode` next to the `Opaque4` case. >> - Is this `Opaque4` node only used for Template Assertion Predicates? >> - Yes: Added assert with call to `assertion_predicate_has_loop_opaque_node()` to check that we find its `OpaqueLoop*Nodes`. >> - I've added test cases where I was not sure about whether an `Opaque4` node could be part of a Template, an Initialized Assertion Predicate or a non-null-check. This was a little tricky but I think it was still worth to prevent future bugs (even though most of these special cases are quite rare). >> >> This is another patch split off from the full fix for Assertion Predicates. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Make OpaqueInitializedAssertionPredicateNode a macro node again > - asdf > - Merge branch 'master' into JDK-8330386 > - Merge branch 'master' into JDK-8330386 > - Add more comments and asserts > - Add more tests > - 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode Last version looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18951#pullrequestreview-2081149156 From kvn at openjdk.org Mon May 27 15:52:03 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 27 May 2024 15:52:03 GMT Subject: RFR: 8325083: jdk/incubator/vector/Double512VectorTests.java crashes in Assembler::vex_prefix_and_encode In-Reply-To: References: Message-ID: On Mon, 27 May 2024 06:06:47 GMT, Jatin Bhateja wrote: > This bugfix patch limits the register class for operands of byte to double cast pattern to prevent reported assertion failure on Knights family CPUs. > > Kindly review and share your feedback. > > Best Regards, > Jatin @jatin-bhateja can you explain in more details what KNL is missing to trigger the assert? Can we predicate on missing feature here instead of KNL && DBL checks? My concern is KNL check could be not enough if such feature is disabled in some container environment which does not match KNL settings. Why we have `assert(UseAVX > 0` here? `Assembler::vpmov*()` instructions have corresponding asserts already. No need to fix it here but I think we need to cleanup `*.ad` files from such duplicated asserts as separate RFE. ------------- PR Review: https://git.openjdk.org/jdk/pull/19407#pullrequestreview-2081199278 From prappo at openjdk.org Mon May 27 16:33:09 2024 From: prappo at openjdk.org (Pavel Rappo) Date: Mon, 27 May 2024 16:33:09 GMT Subject: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier Message-ID: Please review this PR, which supersedes a now withdrawn https://github.com/openjdk/jdk/pull/14831. This PR replaces `ArraysSupport.vectorizedHashCode` with a set of more user-friendly methods. Here's a summary: - Made the operand constants (i.e. `T_BOOLEAN` and friends) and the `vectorizedHashCode` method private - Made the `vectorizedHashCode` method private, but didn't rename it. Renaming would dramatically increase this PR review cost, because that method's name is used by a lot of VM code. On a bright side, since the method is now private, it's no longer callable by clients of `ArraysSupport`, thus a problem of an inaccurate name is less severe. - Made the `ArraysSupport.utf16HashCode` method private - Moved tiny cases (i.e. 0, 1, 2) to `ArraysSupport` ------------- Commit messages: - Initial commit Changes: https://git.openjdk.org/jdk/pull/19414/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19414&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332826 Stats: 258 lines in 13 files changed: 186 ins; 32 del; 40 mod Patch: https://git.openjdk.org/jdk/pull/19414.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19414/head:pull/19414 PR: https://git.openjdk.org/jdk/pull/19414 From prappo at openjdk.org Mon May 27 20:55:29 2024 From: prappo at openjdk.org (Pavel Rappo) Date: Mon, 27 May 2024 20:55:29 GMT Subject: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier [v2] In-Reply-To: References: Message-ID: <1tx_tl3PV2W5NCEXXawQY5V2ndnSOHPfjisypuhKdhA=.79840096-bac0-4da4-8102-c7ecea7cb5f0@github.com> > Please review this PR, which supersedes a now withdrawn https://github.com/openjdk/jdk/pull/14831. > > This PR replaces `ArraysSupport.vectorizedHashCode` with a set of more user-friendly methods. Here's a summary: > > - Made the operand constants (i.e. `T_BOOLEAN` and friends) and the `vectorizedHashCode` method private > > - Made the `vectorizedHashCode` method private, but didn't rename it. Renaming would dramatically increase this PR review cost, because that method's name is used by a lot of VM code. On a bright side, since the method is now private, it's no longer callable by clients of `ArraysSupport`, thus a problem of an inaccurate name is less severe. > > - Made the `ArraysSupport.utf16HashCode` method private > > - Moved tiny cases (i.e. 0, 1, 2) to `ArraysSupport` Pavel Rappo has updated the pull request incrementally with one additional commit since the last revision: Fix incorrect utf16 hashCode adaptation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19414/files - new: https://git.openjdk.org/jdk/pull/19414/files/4ed451d6..adc7557d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19414&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19414&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19414.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19414/head:pull/19414 PR: https://git.openjdk.org/jdk/pull/19414 From gcao at openjdk.org Tue May 28 02:10:20 2024 From: gcao at openjdk.org (Gui Cao) Date: Tue, 28 May 2024 02:10:20 GMT Subject: RFR: 8333006: RISC-V: C2: Support vector-scalar and vector-immediate arithmetic instructions Message-ID: Hi, We want to support vector-scalar and vector-immediate arithmetic instructions, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot. We can use the Byte256VectorTests.java[2] to print the Opto JIT Code, verify and observe the generation of nodes. For example, we can use the following command to print the Opto JIT Code of a jtreg test case: /home/zifeihan/jtreg/bin/jtreg \ -v:default \ -concurrency:16 -timeout:50 \ -javaoption:-XX:+UnlockExperimentalVMOptions \ -javaoption:-XX:+UseRVV \ -javaoption:-XX:+PrintOptoAssembly \ -javaoption:-XX:LogFile=/home/zifeihan/jdk/Byte256VectorTests_PrintOptoAssembly.log \ -jdk:/home/zifeihan/jdk/build/linux-riscv64-server-fastdebug/jdk \ /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/Byte256VectorTests.java we can observe the specified compilation log `Byte256VectorTests_PrintOptoAssembly.log`, which contains the vector-scalar and vector-immediate arithmetic instructions for the PR implementation. vadd_immI Node 16c addw R11, R10, zr #@convI2L_reg_reg 170 add R9, R31, R11 # ptr, #@addP_reg_reg 174 addi R9, R9, #16 # ptr, #@addP_reg_imm 176 loadV V1, [R9] # vector (rvv) 17e vadd_immI V1, V1, #7 186 add R11, R15, R11 # ptr, #@addP_reg_reg 188 addi R11, R11, #16 # ptr, #@addP_reg_imm 18a storeV [R11], V1 # vector (rvv) vadd_immI_masked Node 1e8 B31: # out( B37 B32 ) <- in( B30 ) Freq: 76.2281 1e8 loadV V2, [R31] # vector (rvv) 1f0 vloadmask V0, V1 1f8 vadd_immI_masked V2, V2, #7 200 addi R31, R10, #48 # ptr, #@addP_reg_imm 204 bgeu R30, R7, B37 #@cmpU_branch P=0.000001 C=-1.000000 vadd_regI Node 0c4 B4: # out( B9 B5 ) <- in( B8 B3 ) Freq: 1 0c4 vloadcon V1 # generate iota indices 0cc spill [sp, #4] -> R30 # spill size = 32 0ce vmul_regI V1, V1, R30 0d6 spill [sp, #0] -> R29 # spill size = 32 0d8 vadd_regI V1, V1, R29 vadd_regI_masked Node 244 B36: # out( B33 B37 ) <- in( B35 ) Freq: 7427.81 244 # castII of R30, #@castII 244 addw R31, R30, zr #@convI2L_reg_reg 248 spill [sp, #32] -> R10 # spill size = 64 24a add R10, R10, R31 # ptr, #@addP_reg_reg 24c addi R10, R10, #16 # ptr, #@addP_reg_imm 24e loadV V2, [R10] # vector (rvv) 256 vloadmask V0, V1 25e vadd_regI_masked V2, V2, R29 vsub_regI Node 112 B20: # out( B63 B21 ) <- in( B19 ) Freq: 77.0107 112 # castII of R20, #@castII 112 addw R11, R20, zr #@convI2L_reg_reg 116 add R12, R10, R11 # ptr, #@addP_reg_reg 11a addi R12, R12, #16 # ptr, #@addP_reg_imm 11c loadV V1, [R12] # vector (rvv) 124 vsub_regI V1, V1, R31 12c bgeu R20, R29, B63 #@cmpU_branch P=0.000001 C=-1.000000 vsub_regI_masked Node 1e8 B31: # out( B37 B32 ) <- in( B30 ) Freq: 76.2281 1e8 loadV V2, [R31] # vector (rvv) 1f0 vloadmask V0, V1 1f8 vsub_regI_masked V2, V2, R29 200 addi R31, R10, #48 # ptr, #@addP_reg_imm 204 bgeu R30, R7, B37 #@cmpU_branch P=0.000001 C=-1.000000 vmul_regI Node 0ca B4: # out( B9 B5 ) <- in( B8 B3 ) Freq: 1 0ca vloadcon V1 # generate iota indices 0d2 spill [sp, #0] -> R29 # spill size = 64 0d4 lwu R7, [R29, #12] # loadN, compressed ptr, #@loadN ! Field: jdk/internal/vm/vector/VectorSupport$VectorPayload.payload (constant) 0d8 decode_heap_oop R7, R7 #@decodeHeapOop 0da addi R7, R7, #16 # ptr, #@addP_reg_imm 0dc vmul_regI V1, V1, R30 0e4 loadV V2, [R7] # vector (rvv) vmul_regI_masked Node 198 addw R30, R19, zr #@convI2L_reg_reg 19c spill [sp, #32] -> R31 # spill size = 64 19e add R31, R31, R30 # ptr, #@addP_reg_reg 1a0 addi R10, R31, #16 # ptr, #@addP_reg_imm 1a4 loadV V2, [R10] # vector (rvv) 1ac vloadmask V0, V1 1b4 vmul_regI_masked V2, V2, R29 We can test test/jdk/jdk/incubator/vector/Long256VectorTests.java in the same way, and looking at the Opto logs, we will see nodes similar to vadd_immL, vadd_immL_masked, vadd_regL, vadd_regL_masked, vsub_regL, vsub_regL_masked, vmul_regL, vmul_regL_masked. vadd_immL Node 112 addw R11, R9, zr #@convI2L_reg_reg 116 slli R11, R11, (#3 & 0x3f) #@lShiftL_reg_imm 118 add R14, R29, R11 # ptr, #@addP_reg_reg 11c addi R14, R14, #16 # ptr, #@addP_reg_imm 11e loadV V1, [R14] # vector (rvv) 126 vadd_immL V1, V1, #7 vadd_immL_masked Node 194 addw R30, R19, zr #@convI2L_reg_reg 198 slli R30, R30, (#3 & 0x3f) #@lShiftL_reg_imm 19a spill [sp, #32] -> R31 # spill size = 64 19c add R31, R31, R30 # ptr, #@addP_reg_reg 19e addi R10, R31, #16 # ptr, #@addP_reg_imm 1a2 loadV V1, [R10] # vector (rvv) 1aa vadd_immL_masked V1, V1, #7 vadd_regL Node 104 B17: # out( B20 ) <- in( B16 ) Freq: 0.99999 104 replicateL_imm5 V4, #1 10c vadd_regL V4, V4, R17 114 -- // R23=Thread::current(), empty, #@tlsLoadP 114 mv R31, #0 # int, #@loadConI 116 j B20 #@branch vadd_regL_masked Node 198 addw R30, R19, zr #@convI2L_reg_reg 19c slli R30, R30, (#3 & 0x3f) #@lShiftL_reg_imm 19e spill [sp, #32] -> R31 # spill size = 64 1a0 add R31, R31, R30 # ptr, #@addP_reg_reg 1a2 addi R10, R31, #16 # ptr, #@addP_reg_imm 1a6 loadV V1, [R10] # vector (rvv) 1ae vadd_regL_masked V1, V1, R11 vsub_regL Node 116 addw R11, R19, zr #@convI2L_reg_reg 11a slli R11, R11, (#3 & 0x3f) #@lShiftL_reg_imm 11c add R12, R31, R11 # ptr, #@addP_reg_reg 120 addi R12, R12, #16 # ptr, #@addP_reg_imm 122 loadV V1, [R12] # vector (rvv) 12a vsub_regL V1, V1, R14 vsub_regL_masked Node 198 addw R30, R19, zr #@convI2L_reg_reg 19c slli R30, R30, (#3 & 0x3f) #@lShiftL_reg_imm 19e spill [sp, #32] -> R31 # spill size = 64 1a0 add R31, R31, R30 # ptr, #@addP_reg_reg 1a2 addi R10, R31, #16 # ptr, #@addP_reg_imm 1a6 loadV V1, [R10] # vector (rvv) 1ae vsub_regL_masked V1, V1, R11 vmul_regL Node 0c2 vloadcon V1 # generate iota indices 0ca spill [sp, #0] -> R29 # spill size = 64 0cc lwu R7, [R29, #12] # loadN, compressed ptr, #@loadN ! Field: jdk/internal/vm/vector/VectorSupport$VectorPayload.payload (constant) 0d0 decode_heap_oop R7, R7 #@decodeHeapOop 0d2 addi R7, R7, #16 # ptr, #@addP_reg_imm 0d4 addw R28, R30, zr #@convI2L_reg_reg 0d8 loadV V2, [R7] # vector (rvv) 0e0 vmul_regL V1, V1, R28 vmul_regL_masked Node 19c slli R30, R30, (#3 & 0x3f) #@lShiftL_reg_imm 19e spill [sp, #32] -> R31 # spill size = 64 1a0 add R31, R31, R30 # ptr, #@addP_reg_reg 1a2 addi R10, R31, #16 # ptr, #@addP_reg_imm 1a6 loadV V1, [R10] # vector (rvv) 1ae vmul_regL_masked V1, V1, R11 1b6 spill [sp, #48] -> R10 # spill size = 64 [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc [2] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/Byte256VectorTests.java ### Testing - [x] test/jdk/jdk/incubator/vector (fastdebug) qemu 8.1.50 with UseRVV - [ ] Run tier1-3 tests on SOPHON SG2042 (release) - [ ] Run tier1-3 tests (release) on qemu 8.1.50 with UseRVV ------------- Commit messages: - 8333006: RISC-V: C2: Support vector-scalar and vector-immediate arithmetic instructions Changes: https://git.openjdk.org/jdk/pull/19415/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19415&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333006 Stats: 253 lines in 2 files changed: 252 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19415.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19415/head:pull/19415 PR: https://git.openjdk.org/jdk/pull/19415 From syan at openjdk.org Tue May 28 02:59:15 2024 From: syan at openjdk.org (SendaoYan) Date: Tue, 28 May 2024 02:59:15 GMT Subject: RFR: 8332499: Gtest codestrings.validate_vm fail on linux x64 [v4] In-Reply-To: References: Message-ID: > Hi all, > There's some arch-specific code to trim trailing entries as descripted in [JDK-8332499](https://bugs.openjdk.org/browse/JDK-8332499). Only change the gtest testcase, the risk is low. > > Additional test: > - [x] codestrings.validate_vm on linux x64 > - [x] codestrings.validate_vm on linux aarch64 > - [x] codestrings.validate_vm on linux riscv64 SendaoYan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'openjdk:master' into jbs8332499 - 8332499: Gtest codestrings.validate_vm fail on linux x64 Signed-off-by: sendaoYan - 8332499: Gtest codestrings.validate_vm fail on linux x64 Signed-off-by: sendaoYan ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19309/files - new: https://git.openjdk.org/jdk/pull/19309/files/01b9e688..1c017d20 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19309&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19309&range=02-03 Stats: 20599 lines in 626 files changed: 12728 ins; 4566 del; 3305 mod Patch: https://git.openjdk.org/jdk/pull/19309.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19309/head:pull/19309 PR: https://git.openjdk.org/jdk/pull/19309 From syan at openjdk.org Tue May 28 02:59:15 2024 From: syan at openjdk.org (SendaoYan) Date: Tue, 28 May 2024 02:59:15 GMT Subject: RFR: 8332499: Gtest codestrings.validate_vm fail on linux x64 [v3] In-Reply-To: References: Message-ID: On Mon, 20 May 2024 12:51:26 GMT, SendaoYan wrote: >> Hi all, >> There's some arch-specific code to trim trailing entries as descripted in [JDK-8332499](https://bugs.openjdk.org/browse/JDK-8332499). Only change the gtest testcase, the risk is low. >> >> Additional test: >> - [x] codestrings.validate_vm on linux x64 >> - [x] codestrings.validate_vm on linux aarch64 >> - [x] codestrings.validate_vm on linux riscv64 > > SendaoYan has updated the pull request incrementally with one additional commit since the last revision: > > 8332499: Gtest codestrings.validate_vm fail on linux x64 > > Signed-off-by: sendaoYan Hi, can anyone take a look at this change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19309#issuecomment-2134265572 From jbhateja at openjdk.org Tue May 28 05:29:13 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 28 May 2024 05:29:13 GMT Subject: RFR: 8325083: jdk/incubator/vector/Double512VectorTests.java crashes in Assembler::vex_prefix_and_encode [v2] In-Reply-To: References: Message-ID: > This bugfix patch limits the register class for operands of byte to double cast pattern to prevent reported assertion failure on Knights family CPUs. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Removing redundant assertions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19407/files - new: https://git.openjdk.org/jdk/pull/19407/files/f97928ba..78699f0c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19407&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19407&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19407.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19407/head:pull/19407 PR: https://git.openjdk.org/jdk/pull/19407 From jbhateja at openjdk.org Tue May 28 05:29:13 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 28 May 2024 05:29:13 GMT Subject: RFR: 8325083: jdk/incubator/vector/Double512VectorTests.java crashes in Assembler::vex_prefix_and_encode [v2] In-Reply-To: References: Message-ID: On Mon, 27 May 2024 15:49:13 GMT, Vladimir Kozlov wrote: > @jatin-bhateja can you explain in more details what KNL is missing to trigger the assert? Can we predicate on missing feature here instead of KNL && DBL checks? My concern is KNL check could be not enough if such feature is disabled in some container environment which does not match KNL settings. > > Why we have `assert(UseAVX > 0` here? `Assembler::vpmov*()` instructions have corresponding asserts already. No need to fix it here but I think we need to cleanup `*.ad` files from such duplicated asserts as separate RFE. Hi @vnkozlov , Problem occurs while emitting VMOVSXBD instruction, please refer to following LOC https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L2936 This is a case for byte to double conversion and we break it into two instructions, first for byte to doubleword casting followed by doubleword to double precision casting, while we could have used a combination of VMOVSXBQ + VCVTQQ2PD but it would further sharpen target constrains since quadword to double precision casting needs AVX512DQ feature, thus current scheme works well and by limiting operand allocations to legacy register set we can safely issue 256 bit VMOVSXBD instruction on KNL target which lacks AVX512VL feature. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19407#issuecomment-2134368084 From thartmann at openjdk.org Tue May 28 06:36:04 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 28 May 2024 06:36:04 GMT Subject: RFR: 8326615: C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v4] In-Reply-To: References: Message-ID: <4s5mzPw27atjQ-B_xMW4DYQRZ6Uzd6WV1Xted-s5TGY=.efa88d12-7c7f-4e32-921f-e71e702bfc5a@github.com> On Thu, 23 May 2024 08:57:14 GMT, Damon Fenacci wrote: >> # Issue >> >> The test `compiler/startup/StartupOutput.java` fails intermittently due to a crash after correctly printing the error `Initial size of CodeCache is too small` (the test limits the code cache using k-XX:InitialCodeCacheSize=1024K -XX:ReservedCodeCacheSize=1200k`). >> The appearance of the issue is very dependent on thread scheduling. The original report happens during C1 initialization but C2 initialization is affected as well. >> >> # Causes >> >> There is one occurrence during C1 initialization and one during C2 initialization where a call to `RuntimeStub::new_runtime_stub` can fail fatally if there is not enough space left. >> For C1: `Compiler::init_c1_runtime` -> `Runtime1::initialize` -> `Runtime1::generate_blob_for` -> `Runtime1::generate_blob` -> `RuntimeStub::new_runtime_stub`. >> For C2: `C2Compiler::initialize` -> `OptoRuntime::generate` -> `OptoRuntime::generate_stub` -> `Compile::Compile` -> `Compile::Code_Gen` -> `PhaseOutput::install` -> `PhaseOutput::install_stub` -> `RuntimeStub::new_runtime_stub`. >> >> # Solution >> >> https://github.com/openjdk/jdk/pull/15970 introduced an optional argument to `RuntimeStub::new_runtime_stub` to determine if it fails fatally or not. We can take advantage of it to avoid crashing and instead pass the information about the success or failure of the allocation up the (C1 and C2 initialization) call stack up to where we can set the compilations as failed. > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8326615: update copyright year Looks good to me otherwise. src/hotspot/share/gc/x/c1/xBarrierSetC1.cpp line 229: > 227: XLoadBarrierRuntimeStubCodeGenClosure cl(decorators); > 228: CodeBlob* const code_blob = Runtime1::generate_blob(blob, -1 /* stub_id */, name, false /* expect_oop_map*/, &cl); > 229: return code_blob != nullptr?code_blob->code_begin():nullptr; Suggestion: return (code_blob != nullptr) ? code_blob->code_begin() : nullptr; src/hotspot/share/gc/z/c1/zBarrierSetC1.cpp line 511: > 509: ZLoadBarrierRuntimeStubCodeGenClosure cl(decorators); > 510: CodeBlob* const code_blob = Runtime1::generate_blob(blob, -1 /* stub_id */, name, false /* expect_oop_map*/, &cl); > 511: return code_blob != nullptr?code_blob->code_begin():nullptr; Suggestion: return (code_blob != nullptr) ? code_blob->code_begin() : nullptr; src/hotspot/share/gc/z/c1/zBarrierSetC1.cpp line 531: > 529: ZStoreBarrierRuntimeStubCodeGenClosure cl(self_healing); > 530: CodeBlob* const code_blob = Runtime1::generate_blob(blob, -1 /* stub_id */, name, false /* expect_oop_map*/, &cl); > 531: return code_blob != nullptr?code_blob->code_begin():nullptr; Suggestion: return (code_blob != nullptr) ? code_blob->code_begin() : nullptr; ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19280#pullrequestreview-2081868058 PR Review Comment: https://git.openjdk.org/jdk/pull/19280#discussion_r1616666641 PR Review Comment: https://git.openjdk.org/jdk/pull/19280#discussion_r1616667139 PR Review Comment: https://git.openjdk.org/jdk/pull/19280#discussion_r1616667424 From fyang at openjdk.org Tue May 28 07:12:02 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 28 May 2024 07:12:02 GMT Subject: RFR: 8333006: RISC-V: C2: Support vector-scalar and vector-immediate arithmetic instructions In-Reply-To: References: Message-ID: <6EI4wpI9U7LpAW4QiOQOfpiuzHZjwiq77i1NUU-L1-g=.1241f023-2857-4259-9a8b-f8e7317f1e1c@github.com> On Mon, 27 May 2024 16:33:30 GMT, Gui Cao wrote: > Hi, We want to support vector-scalar and vector-immediate arithmetic instructions, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot. > We can use the Byte256VectorTests.java[2] to print the Opto JIT Code, verify and observe the generation of nodes. > > For example, we can use the following command to print the Opto JIT Code of a jtreg test case: > > > /home/zifeihan/jtreg/bin/jtreg \ > -v:default \ > -concurrency:16 -timeout:50 \ > -javaoption:-XX:+UnlockExperimentalVMOptions \ > -javaoption:-XX:+UseRVV \ > -javaoption:-XX:+PrintOptoAssembly \ > -javaoption:-XX:LogFile=/home/zifeihan/jdk/Byte256VectorTests_PrintOptoAssembly.log \ > -jdk:/home/zifeihan/jdk/build/linux-riscv64-server-fastdebug/jdk \ > /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/Byte256VectorTests.java > > > > we can observe the specified compilation log `Byte256VectorTests_PrintOptoAssembly.log`, which contains the vector-scalar and vector-immediate arithmetic instructions for the PR implementation. > > vadd_immI Node > > 16c addw R11, R10, zr #@convI2L_reg_reg > 170 add R9, R31, R11 # ptr, #@addP_reg_reg > 174 addi R9, R9, #16 # ptr, #@addP_reg_imm > 176 loadV V1, [R9] # vector (rvv) > 17e vadd_immI V1, V1, #7 > 186 add R11, R15, R11 # ptr, #@addP_reg_reg > 188 addi R11, R11, #16 # ptr, #@addP_reg_imm > 18a storeV [R11], V1 # vector (rvv) > > > vadd_immI_masked Node > > 1e8 B31: # out( B37 B32 ) <- in( B30 ) Freq: 76.2281 > 1e8 loadV V2, [R31] # vector (rvv) > 1f0 vloadmask V0, V1 > 1f8 vadd_immI_masked V2, V2, #7 > 200 addi R31, R10, #48 # ptr, #@addP_reg_imm > 204 bgeu R30, R7, B37 #@cmpU_branch P=0.000001 C=-1.000000 > > > vadd_regI Node > > 0c4 B4: # out( B9 B5 ) <- in( B8 B3 ) Freq: 1 > 0c4 vloadcon V1 # generate iota indices > 0cc spill [sp, #4] -> R30 # spill size = 32 > 0ce vmul_regI V1, V1, R30 > 0d6 spill [sp, #0] -> R29 # spill size = 32 > 0d8 vadd_regI V1, V1, R29 > > > vadd_regI_masked Node > > 244 B36: # out( B33 B37 ) <- in( B35 ) Freq: 7427.81 > 244 # castII of R30, #@castII > 244 addw R31, R30, zr #@convI2L_reg_reg > 248 spill [sp, #32] -> R10 # spill size = 64 > 24a add R10, R10, R31 # ptr, #@addP_reg_reg > 24c addi R10, R10, #16 # ptr, #@addP_reg_imm > 24e loadV V2, [R10] # vector (rvv) > 256 vloadmask V0, V1 > 25e vadd_regI_masked V2, V2, R29 > > > vsub_regI Node > > 112 B20: # out( B63 B21 ) <- in( B19 ) Freq: 77.0107 > 112 # castII of R20, #@castII > 112 addw R11, R2... src/hotspot/cpu/riscv/riscv_v.ad line 395: > 393: match(Set dst_src (AddVB dst_src (Replicate con))); > 394: match(Set dst_src (AddVS dst_src (Replicate con))); > 395: match(Set dst_src (AddVI dst_src (Replicate con))); Is it necessary to require that `src` and `dst` be the same register for un-predicated versions? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19415#discussion_r1616707609 From dfenacci at openjdk.org Tue May 28 07:16:15 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 28 May 2024 07:16:15 GMT Subject: RFR: 8326615: C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v5] In-Reply-To: References: Message-ID: > # Issue > > The test `compiler/startup/StartupOutput.java` fails intermittently due to a crash after correctly printing the error `Initial size of CodeCache is too small` (the test limits the code cache using k-XX:InitialCodeCacheSize=1024K -XX:ReservedCodeCacheSize=1200k`). > The appearance of the issue is very dependent on thread scheduling. The original report happens during C1 initialization but C2 initialization is affected as well. > > # Causes > > There is one occurrence during C1 initialization and one during C2 initialization where a call to `RuntimeStub::new_runtime_stub` can fail fatally if there is not enough space left. > For C1: `Compiler::init_c1_runtime` -> `Runtime1::initialize` -> `Runtime1::generate_blob_for` -> `Runtime1::generate_blob` -> `RuntimeStub::new_runtime_stub`. > For C2: `C2Compiler::initialize` -> `OptoRuntime::generate` -> `OptoRuntime::generate_stub` -> `Compile::Compile` -> `Compile::Code_Gen` -> `PhaseOutput::install` -> `PhaseOutput::install_stub` -> `RuntimeStub::new_runtime_stub`. > > # Solution > > https://github.com/openjdk/jdk/pull/15970 introduced an optional argument to `RuntimeStub::new_runtime_stub` to determine if it fails fatally or not. We can take advantage of it to avoid crashing and instead pass the information about the success or failure of the allocation up the (C1 and C2 initialization) call stack up to where we can set the compilations as failed. Damon Fenacci has updated the pull request incrementally with three additional commits since the last revision: - Update src/hotspot/share/gc/z/c1/zBarrierSetC1.cpp Co-authored-by: Tobias Hartmann - Update src/hotspot/share/gc/z/c1/zBarrierSetC1.cpp Co-authored-by: Tobias Hartmann - Update src/hotspot/share/gc/x/c1/xBarrierSetC1.cpp Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19280/files - new: https://git.openjdk.org/jdk/pull/19280/files/f16d9910..ea23c61e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19280&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19280&range=03-04 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19280.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19280/head:pull/19280 PR: https://git.openjdk.org/jdk/pull/19280 From dfenacci at openjdk.org Tue May 28 07:16:15 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 28 May 2024 07:16:15 GMT Subject: RFR: 8326615: C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v4] In-Reply-To: <4s5mzPw27atjQ-B_xMW4DYQRZ6Uzd6WV1Xted-s5TGY=.efa88d12-7c7f-4e32-921f-e71e702bfc5a@github.com> References: <4s5mzPw27atjQ-B_xMW4DYQRZ6Uzd6WV1Xted-s5TGY=.efa88d12-7c7f-4e32-921f-e71e702bfc5a@github.com> Message-ID: On Tue, 28 May 2024 06:33:28 GMT, Tobias Hartmann wrote: > Looks good to me otherwise. Thanks for the review @TobiHartmann! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19280#issuecomment-2134501833 From epeter at openjdk.org Tue May 28 07:33:07 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 May 2024 07:33:07 GMT Subject: RFR: 8328181: C2: assert(MaxVectorSize >= 32) failed: vector length should be >= 32 [v2] In-Reply-To: References: Message-ID: <4s13KsZ8dnv_t_5AUyOFmjWsUwRJDtl0OjBGOMlmlRs=.bcf5f967-d70f-4879-bb16-2d1045a63fb4@github.com> On Mon, 8 Apr 2024 02:35:33 GMT, Jatin Bhateja wrote: >> This bug fix patch tightens the predication check for small constant length clear array pattern and relaxes associated feature checks. Modified few comments for clarity. >> >> Kindly review and approve. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Cleanup predicates. src/hotspot/cpu/x86/x86.ad line 1753: > 1751: } > 1752: break; > 1753: case Op_ClearArray: This seems problematic, and may lead to the regression in https://bugs.openjdk.org/browse/JDK-8332487 On non-AVX512 platforms, this is now always `true` instead of always `false`. Probably this was not intended, and you thought this way going to be default `false`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18464#discussion_r1616735713 From mli at openjdk.org Tue May 28 07:54:06 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 28 May 2024 07:54:06 GMT Subject: RFR: 8332883: Some simple cleanup in vectornode.cpp In-Reply-To: References: <_OntRXQMobbozvu5_QPLpEny6Wsfv5pFQGYhWw8aSCE=.7389a53d-b139-4825-8fc6-e22e7220fe9e@github.com> Message-ID: On Fri, 24 May 2024 19:27:32 GMT, Vladimir Kozlov wrote: >> Hi, >> Can you review this simple cleanup in vectornode.cpp? >> Thanks! > > Good. Thanks @vnkozlov @JohnTortugo for your reviewing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19392#issuecomment-2134564128 From mli at openjdk.org Tue May 28 07:54:07 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 28 May 2024 07:54:07 GMT Subject: Integrated: 8332883: Some simple cleanup in vectornode.cpp In-Reply-To: <_OntRXQMobbozvu5_QPLpEny6Wsfv5pFQGYhWw8aSCE=.7389a53d-b139-4825-8fc6-e22e7220fe9e@github.com> References: <_OntRXQMobbozvu5_QPLpEny6Wsfv5pFQGYhWw8aSCE=.7389a53d-b139-4825-8fc6-e22e7220fe9e@github.com> Message-ID: On Fri, 24 May 2024 11:58:18 GMT, Hamlin Li wrote: > Hi, > Can you review this simple cleanup in vectornode.cpp? > Thanks! This pull request has now been integrated. Changeset: 2f2cf38b Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/2f2cf38bb5cecea698e519396574343cfbe4f359 Stats: 17 lines in 1 file changed: 0 ins; 13 del; 4 mod 8332883: Some simple cleanup in vectornode.cpp Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/19392 From chagedorn at openjdk.org Tue May 28 08:03:33 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 28 May 2024 08:03:33 GMT Subject: RFR: 8325155: C2 SuperWord: remove alignment boundaries [v2] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 13:34:04 GMT, Emanuel Peter wrote: >> I have tried for a very long time to get rid of all the `alignment(n)` code that is all over the SuperWord code. With lots of previous work, I am now finally ready to remove it. >> >> I was able to remove lots of VM code, about 300 lines. And the removed code is I think much more complicated than the new code. >> >> This is what I did in this PR: >> - Removal of `_node_info`: used to have many fields, which I refactored out to the `VLoopAnalyzer` modules. `alignment` is the last component, which I now remove. >> - Changed the implementation of `SuperWord::find_adjacent_refs`, now `SuperWord::find_adjacent_memop_pairs`, completely: >> - It used to be an algorithm that would scan over all `memops` repeatedly, try to find some `mem_ref` and see which other memops were comparable, and then pack pairs for all of those, by comparing all-vs-all memops. This algorithm is at least quadratic, if not much worse. >> - I now add all `memops` into a single array, sort them by groups (those that are comparable with each other and could be packed into vectors), and inside the groups by ascending offset. This allows me to split off the groups much more efficiently, and also the sorting by offset allows me finding adjacent pairs much more efficiently. In the most cases this reduces the cost to `O(n log n)` for sort, and a linear scan for finding adjacent memops. >> - I removed the "alignment boundaries" created in `SuperWord::memory_alignment` by `int off_rem = offset % vw;`. >> - This used to have the effect that all offsets were computed modulo the vector width. Hence, pairs could not be packed across this boundary (e.g. we have nodes with offsets `31, 32`, which are adjacent in theory, but if we have a `vw = 32`, then the modulo-offsets are `31, 0`, and they are not detected as adjacent). >> - These "alignment boundaries" used to be required for correctness about a year ago, before I fixed and relaxed much of the alignment code. >> - The `alignment` used to have another important task: Ensuring compatibility of the input-size of a use node, with the output-size of the def-node. >> - This was done by giving all nodes an `alignment`, even the non-memop nodes. This `alignment` was then scaled up and down at type casts (e.g. int `0, 4, 8, 12` -> long `0, 8, 16, 24`). If the output-size of the def-node did not match the input-size of the use-node, then the `alignment` would not match up, and we would not pack. >> - This is why we used to have checks like `alignment(s1) + da... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: > > - Merge branch 'master' into JDK-8325155-rm-alignment-boundaries > - rm TODO > - manual merge > - revert a line, need to fix it different > - improve comments > - fix alignment > - fix reductions > - MaxI reduction over chars > - Merge branch 'master' into JDK-8325155-rm-alignment-boundaries > - Merge branch 'master' into JDK-8325155-rm-alignment-boundaries > - ... and 15 more: https://git.openjdk.org/jdk/compare/c4867c62...82c9a77a Great cleanup! I have some comments but otherwise, looks good. src/hotspot/share/opto/superword.cpp line 495: > 493: > 494: // Collect all valid VPointers. > 495: for_each_mem([&] (const MemNode* mem, int bb_idx) { The different parts of this method could be nicely put into separate methods which reduces the size of `find_adjacent_memop_pairs()`. GrowableArray vpointers; collect_valid_vpointers(vpointers); vpointers.sort(); // trace code find_adjacent_memops(vpointers); // trace code The entire "find adjacent memop pairs" code could also be put into a separate class but I leave it up to you to decide if it's worth or not. src/hotspot/share/opto/superword.cpp line 505: > 503: > 504: // Sort the VPointers. This does 2 things: > 505: // - Separate the VPointer into groups (e.g. all LoadI of the same base and invar). We only need to find adjacent memops inside Maybe you should state here what a "group" is. It is only explained at `cmp_for_sort_by_group()`. src/hotspot/share/opto/superword.cpp line 507: > 505: // - Separate the VPointer into groups (e.g. all LoadI of the same base and invar). We only need to find adjacent memops inside > 506: // the group. This decreases the work. > 507: // - Sort by offset inside the VPointers. This decreases the work needed to determine adjacent memops inside a group. VPointers -> group? Suggestion: // - Sort by offset inside the group. This decreases the work needed to determine adjacent memops inside a group. src/hotspot/share/opto/superword.cpp line 525: > 523: vpointers.adr_at(group_end) > 524: ) == 0) { > 525: group_end++; This is somewhat hard to read. How about putting this into a separate method? I.e. int group_start = 0; while (group_start < vpointers.length()) { int group_end = find_group_end(vpointers, group_start); // <---- EXTRACTED to new method find_adjacent_memop_pairs_in_group(vpointers, group_start, group_end); group_start = group_end; } src/hotspot/share/opto/superword.cpp line 539: > 537: } > 538: > 539: // Find adjacent memops for a single group, e.g. for all LoadI of the same base, invar, etc. You should mention here that this method finally adds a new pair to the `_pairset`. On a separate note, `find_adjacent_memop_pairs_in_group()` suggests that we find something but we actually "find and add" something without returning anything from the method. Should we make this more clear in the method name? src/hotspot/share/opto/superword.cpp line 2758: > 2756: #ifdef ASSERT > 2757: for (uint i = 1; i < u_pk->size(); i++) { > 2758: assert(u_pk->at(i-1) == u_pk->at(i)->in(1), "internal connection"); Suggestion: assert(u_pk->at(i - 1) == u_pk->at(i)->in(1), "internal connection"); src/hotspot/share/opto/superword.cpp line 2790: > 2788: } > 2789: > 2790: if (!is_velt_basic_type_compatible_use_def(use, u_idx)) { Might be easier to directly put in `def` as defined on L2764 instead of passing the index to the def. src/hotspot/share/opto/superword.cpp line 2816: > 2814: } > 2815: > 2816: bool SuperWord::is_velt_basic_type_compatible_use_def(Node* use, int idx) const { Maybe add a comment here to quickly explain that compatible means "output size of the def node matches the input size of the use node". src/hotspot/share/opto/superword.cpp line 2843: > 2841: } > 2842: > 2843: return type2aelembytes(use_bt) == type2aelembytes(def_bt); Maybe for completeness: Suggestion: // Default case: input size of use equals output size of def. return type2aelembytes(use_bt) == type2aelembytes(def_bt); src/hotspot/share/opto/superword.hpp line 563: > 561: // Find the "seed" pairs. These are pairs that we strongly suspect would lead to vectorization. > 562: void find_adjacent_memop_pairs(); > 563: void find_adjacent_memop_pairs_in_group(const GrowableArray &vpointers, const int group_start, int group_end); Suggestion: void find_adjacent_memop_pairs_in_group(const GrowableArray& vpointers, const int group_start, int group_end); ------------- PR Review: https://git.openjdk.org/jdk/pull/18822#pullrequestreview-2080843859 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1616003411 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1616038210 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1616039440 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1616049161 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1616051440 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1616118671 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1616131619 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1616126092 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1616128257 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1616712890 From chagedorn at openjdk.org Tue May 28 08:15:09 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 28 May 2024 08:15:09 GMT Subject: RFR: 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode [v3] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 08:47:19 GMT, Christian Hagedorn wrote: >> This patch replaces the `Opaque4Node` of the `If` for Initialized Assertion Predicates with a new `OpaqueInitializedAsseritonPredicateNode`. This helps to simplify pattern matching for predicate code and to distinguish from the two other uses of `Opaque4` nodes: >> 1. Template Assertion Predicate: The goal is to get rid of its `Opaque4Node` as well by using a dedicated `TemplateAssertionPredicateNode` for the `IfNode`. >> 2. Non-null-checks with instrinsics and unsafe accesses: This will eventually be the only use left. Once we get there, we should rename the node accordingly to `OpaqueNonNullCheck` or something like that. >> >> I went through all the uses of `Opaque4` nodes and did the following: >> - Could the `Opaque4` node be part of an Initialized Assertion Predicate? >> - No: Added an assert that we are not dealing with an Initialized Assertion Predicate. >> - Yes: >> - Yes **and only** for Initialized Assertion Predicates? Added an assert that we are only expecting an `OpaqueInitializedAsseritonPredicateNode` if appropriate. >> - Yes but could also be something else: Added case for `OpaqueInitializedAsseritonPredicateNode` next to the `Opaque4` case. >> - Is this `Opaque4` node only used for Template Assertion Predicates? >> - Yes: Added assert with call to `assertion_predicate_has_loop_opaque_node()` to check that we find its `OpaqueLoop*Nodes`. >> - I've added test cases where I was not sure about whether an `Opaque4` node could be part of a Template, an Initialized Assertion Predicate or a non-null-check. This was a little tricky but I think it was still worth to prevent future bugs (even though most of these special cases are quite rare). >> >> This is another patch split off from the full fix for Assertion Predicates. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Make OpaqueInitializedAssertionPredicateNode a macro node again > - asdf > - Merge branch 'master' into JDK-8330386 > - Merge branch 'master' into JDK-8330386 > - Add more comments and asserts > - Add more tests > - 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode Thanks Vladimir for your re-review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18951#issuecomment-2134601572 From chagedorn at openjdk.org Tue May 28 08:15:10 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 28 May 2024 08:15:10 GMT Subject: Integrated: 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode In-Reply-To: References: Message-ID: On Thu, 25 Apr 2024 13:34:31 GMT, Christian Hagedorn wrote: > This patch replaces the `Opaque4Node` of the `If` for Initialized Assertion Predicates with a new `OpaqueInitializedAsseritonPredicateNode`. This helps to simplify pattern matching for predicate code and to distinguish from the two other uses of `Opaque4` nodes: > 1. Template Assertion Predicate: The goal is to get rid of its `Opaque4Node` as well by using a dedicated `TemplateAssertionPredicateNode` for the `IfNode`. > 2. Non-null-checks with instrinsics and unsafe accesses: This will eventually be the only use left. Once we get there, we should rename the node accordingly to `OpaqueNonNullCheck` or something like that. > > I went through all the uses of `Opaque4` nodes and did the following: > - Could the `Opaque4` node be part of an Initialized Assertion Predicate? > - No: Added an assert that we are not dealing with an Initialized Assertion Predicate. > - Yes: > - Yes **and only** for Initialized Assertion Predicates? Added an assert that we are only expecting an `OpaqueInitializedAsseritonPredicateNode` if appropriate. > - Yes but could also be something else: Added case for `OpaqueInitializedAsseritonPredicateNode` next to the `Opaque4` case. > - Is this `Opaque4` node only used for Template Assertion Predicates? > - Yes: Added assert with call to `assertion_predicate_has_loop_opaque_node()` to check that we find its `OpaqueLoop*Nodes`. > - I've added test cases where I was not sure about whether an `Opaque4` node could be part of a Template, an Initialized Assertion Predicate or a non-null-check. This was a little tricky but I think it was still worth to prevent future bugs (even though most of these special cases are quite rare). > > This is another patch split off from the full fix for Assertion Predicates. > > Thanks, > Christian This pull request has now been integrated. Changeset: 2edb6d98 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/2edb6d98133d8bd6dc4527c7497c460283fdc53e Stats: 550 lines in 15 files changed: 483 ins; 8 del; 59 mod 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode Reviewed-by: kvn, roland ------------- PR: https://git.openjdk.org/jdk/pull/18951 From syan at openjdk.org Tue May 28 08:52:03 2024 From: syan at openjdk.org (SendaoYan) Date: Tue, 28 May 2024 08:52:03 GMT Subject: RFR: 8332499: Gtest codestrings.validate_vm fail on linux x64 [v4] In-Reply-To: References: Message-ID: On Tue, 28 May 2024 02:59:15 GMT, SendaoYan wrote: >> Hi all, >> There's some arch-specific code to trim trailing entries as descripted in [JDK-8332499](https://bugs.openjdk.org/browse/JDK-8332499). Only change the gtest testcase, the risk is low. >> >> Additional test: >> - [x] codestrings.validate_vm on linux x64 >> - [x] codestrings.validate_vm on linux aarch64 >> - [x] codestrings.validate_vm on linux riscv64 > > SendaoYan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'openjdk:master' into jbs8332499 > - 8332499: Gtest codestrings.validate_vm fail on linux x64 > > Signed-off-by: sendaoYan > - 8332499: Gtest codestrings.validate_vm fail on linux x64 > > Signed-off-by: sendaoYan The GHA test runner report two failures: - gc/stringdedup/TestStringDeduplicationInterned.java#Shenandoah fails on linux x86_32: `Deduplication has not occurred, load history: min: 0.9901123046875, max: 1.079345703125`, I create a [jbs](https://bugs.openjdk.org/browse/JDK-8333040) to record this issue. This failure unrelate to this PR. - serviceability/jvmti/ObjectMonitorUsage/ObjectMonitorUsage.java fails, which has been recorded by [JDK-8332923](https://bugs.openjdk.org/browse/JDK-8332923), unrelate to this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19309#issuecomment-2134674032 From mli at openjdk.org Tue May 28 09:06:14 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 28 May 2024 09:06:14 GMT Subject: RFR: 8320999: RISC-V: C2 RotateLeftV [v4] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > More detailed description is inline in the code. > Thanks Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: add reg version ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19325/files - new: https://git.openjdk.org/jdk/pull/19325/files/2b295d6e..474e0720 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19325&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19325&range=02-03 Stats: 60 lines in 2 files changed: 55 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/19325.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19325/head:pull/19325 PR: https://git.openjdk.org/jdk/pull/19325 From mli at openjdk.org Tue May 28 09:06:14 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 28 May 2024 09:06:14 GMT Subject: RFR: 8320999: RISC-V: C2 RotateLeftV [v2] In-Reply-To: References: <-xClnQXcYdLw5tq_Kq4PtOSNEO13lOrR_vD-nIMCzGU=.66662885-a465-4f23-92df-8de5cc656309@github.com> <1guRcC9sUUBUCRGW1ku3r8dZerahN2V8Eig5lodyUH4=.7e8607c9-6c97-4224-8db7-c063a359ea52@github.com> Message-ID: On Fri, 24 May 2024 14:57:36 GMT, Fei Yang wrote: >> I would also favor using `.vi` or `.vx` variants over `.vv` variants where possible. This would reduce the vector register pressure and remove an unnecessary instruction. >> >> @Hamlin-Li in your example, we could instead have: >> >> ... ... >> 0x00002aaac560c594: vle32.v v2,(a4) >> 0x00002aaac560c598: vsetivli t0,8,e32,m1,tu,mu >> 0x00002aaac560c59c: vror.vx v2,v2,a3 > > And for your case, this would help save the `vmv.v.x v1,a3` instruction if you do `vror.vv v2,v2,a3` instead of `vror.vv v2,v2,v1`. Right? Thanks for clarification, I misunderstood. Added vector-scalar rotate variants. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19325#discussion_r1616867157 From epeter at openjdk.org Tue May 28 09:50:09 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 May 2024 09:50:09 GMT Subject: RFR: 8328181: C2: assert(MaxVectorSize >= 32) failed: vector length should be >= 32 [v2] In-Reply-To: <4s13KsZ8dnv_t_5AUyOFmjWsUwRJDtl0OjBGOMlmlRs=.bcf5f967-d70f-4879-bb16-2d1045a63fb4@github.com> References: <4s13KsZ8dnv_t_5AUyOFmjWsUwRJDtl0OjBGOMlmlRs=.bcf5f967-d70f-4879-bb16-2d1045a63fb4@github.com> Message-ID: On Tue, 28 May 2024 07:30:54 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Cleanup predicates. > > src/hotspot/cpu/x86/x86.ad line 1753: > >> 1751: } >> 1752: break; >> 1753: case Op_ClearArray: > > This seems problematic, and may lead to the regression in https://bugs.openjdk.org/browse/JDK-8332487 > > On non-AVX512 platforms, this is now always `true` instead of always `false`. Probably this was not intended, and you thought this way going to be default `false`? I don't understand what you are implying. Are you saying this is not the reason for the regression? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18464#discussion_r1616928926 From duke at openjdk.org Tue May 28 09:55:08 2024 From: duke at openjdk.org (duke) Date: Tue, 28 May 2024 09:55:08 GMT Subject: Withdrawn: 8328865: [c2] No need to convert "(x+1)+y" into "(x+y)+1" when y is a CallNode In-Reply-To: References: Message-ID: <-3-1POWc6TBBTKeT2ElO-3aTdfux88LJPKmrrM6-Ulk=.612d32c5-cb28-451a-ba8e-543bd64984b1@github.com> On Tue, 26 Mar 2024 07:17:21 GMT, SUN Guoyun wrote: > This patch prohibits the conversion from "(x+1)+y" into "(x+y)+1" when y is a CallNode to reduce unnecessary spillcode and ADDNode. > > Testing: tier1-3 in x86_64 and LoongArch64 > > JMH in x86_64: >

    > before:
    > Benchmark           Mode  Cnt      Score   Error  Units
    > CallNode.test      thrpt    2  26397.733          ops/s
    > 
    > after:
    > Benchmark           Mode  Cnt      Score   Error  Units
    > CallNode.test      thrpt    2  27839.337          ops/s
    > 
    This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/18482 From fgao at openjdk.org Tue May 28 10:01:08 2024 From: fgao at openjdk.org (Fei Gao) Date: Tue, 28 May 2024 10:01:08 GMT Subject: RFR: 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" [v2] In-Reply-To: <9QZn_Rgc_Vk9x-c5w3MX-IIX7hHICjnsm_tFLvLtL4M=.b6177857-af0a-4070-8860-7e2d395f9ed7@github.com> References: <16J-lJ2AceGTVcRWBcP15yKcwO-1IA1XsngyOuNjf7k=.0776f081-ae2c-4279-87cf-d909806c2bc4@github.com> <9QZn_Rgc_Vk9x-c5w3MX-IIX7hHICjnsm_tFLvLtL4M=.b6177857-af0a-4070-8860-7e2d395f9ed7@github.com> Message-ID: On Fri, 24 May 2024 08:54:40 GMT, Emanuel Peter wrote: >> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Remove unused immIOffset/immLOffset >> - Merge branch 'master' into fg8319690 >> - 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" >> >> On LP64 systems, if the heap can be moved into low virtual >> address space (below 4GB) and the heap size is smaller than the >> interesting threshold of 4 GB, we can use unscaled decoding >> pattern for narrow klass decoding. It means that a generic field >> reference can be decoded by: >> ``` >> cast<64> (32-bit compressed reference) + field_offset >> ``` >> >> When the `field_offset` is an immediate, on aarch64 platform, the >> unscaled decoding pattern can match perfectly with a direct >> addressing mode, i.e., `base_plus_offset`, supported by LDR/STR >> instructions. But for certain data width, not all immediates can >> be encoded in the instruction field of LDR/STR[1]. The ranges are >> different as data widths vary. >> >> For example, when we try to load a value of long type at offset of >> `1030`, the address expression is `(AddP (DecodeN base) 1030)`. >> Before the patch, the expression was matching with >> `operand indOffIN()`. But, for 64-bit LDR/STR, signed immediate >> byte offset must be in the range -256 to 255 or positive immediate >> byte offset must be a multiple of 8 in the range 0 to 32760[2]. >> `1030` can't be encoded in the instruction field. So, after >> matching, when we do checking for instruction encoding, the >> assertion would fail. >> >> In this patch, we're going to filter out invalid immediates >> when deciding if current addressing mode can be matched as >> `base_plus_offset`. We introduce `indOffIN4/indOffLN4` and >> `indOffIN8/indOffLN8` for 32-bit data type and 64-bit data >> type separately in the patch. E.g., for `memory4`, we remove >> the generic `indOffIN/indOffLN`, which matches wrong unscaled >> immediate range, and replace them with `indOffIN4/indOffLN4` >> instead. >> >> Since 8-bit and 16-bit LDR/STR instructions also support the >> unscaled decoding pattern, we add the addressing mode in the >> lists of `memory1` and `memory2` by introducing >> `indOffIN1/indOffLN1` and `indOffIN2/indOffLN2`. >> >> ... > > test/hotspot/jtreg/compiler/c2/aarch64/TestUnalignedAccessCompressedOops.java line 35: > >> 33: * @library /test/lib >> 34: * @modules java.base/jdk.internal.misc >> 35: * @requires os.arch=="aarch64" & vm.compiler2.enabled > > I would remove these two lines. Because who knows, maybe some other platform has similar issues down the road. Or maybe graalVM has a bug that we could catch with this. I'll continue processing this patch. I can remove it on next update if you don't mind it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16991#discussion_r1616943068 From jbhateja at openjdk.org Tue May 28 10:12:15 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 28 May 2024 10:12:15 GMT Subject: RFR: 8328181: C2: assert(MaxVectorSize >= 32) failed: vector length should be >= 32 [v2] In-Reply-To: References: <4s13KsZ8dnv_t_5AUyOFmjWsUwRJDtl0OjBGOMlmlRs=.bcf5f967-d70f-4879-bb16-2d1045a63fb4@github.com> Message-ID: On Tue, 28 May 2024 09:47:08 GMT, Emanuel Peter wrote: >> src/hotspot/cpu/x86/x86.ad line 1753: >> >>> 1751: } >>> 1752: break; >>> 1753: case Op_ClearArray: >> >> This seems problematic, and may lead to the regression in https://bugs.openjdk.org/browse/JDK-8332487 >> >> On non-AVX512 platforms, this is now always `true` instead of always `false`. Probably this was not intended, and you thought this way going to be default `false`? > > I don't understand what you are implying. Are you saying this is not the reason for the regression? Yes, this can cause regression since now on non-AVX512 targets compiler may not emit StoreL based instruction sequence and select one of the clear array pattern based on target feature checks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18464#discussion_r1616958675 From gcao at openjdk.org Tue May 28 10:49:26 2024 From: gcao at openjdk.org (Gui Cao) Date: Tue, 28 May 2024 10:49:26 GMT Subject: RFR: 8333006: RISC-V: C2: Support vector-scalar and vector-immediate arithmetic instructions [v2] In-Reply-To: References: Message-ID: <-0PMAvAivEpLgl0qFHHn21m3MrU7Xa7QmO6g2qHgfRQ=.120a7ee2-5525-4bfb-887b-da860feca3d0@github.com> > Hi, We want to support vector-scalar and vector-immediate arithmetic instructions, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot. > We can use the Byte256VectorTests.java[2] to print the Opto JIT Code, verify and observe the generation of nodes. > > For example, we can use the following command to print the Opto JIT Code of a jtreg test case: > > > /home/zifeihan/jtreg/bin/jtreg \ > -v:default \ > -concurrency:16 -timeout:50 \ > -javaoption:-XX:+UnlockExperimentalVMOptions \ > -javaoption:-XX:+UseRVV \ > -javaoption:-XX:+PrintOptoAssembly \ > -javaoption:-XX:LogFile=/home/zifeihan/jdk/Byte256VectorTests_PrintOptoAssembly.log \ > -jdk:/home/zifeihan/jdk/build/linux-riscv64-server-fastdebug/jdk \ > /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/Byte256VectorTests.java > > > > we can observe the specified compilation log `Byte256VectorTests_PrintOptoAssembly.log`, which contains the vector-scalar and vector-immediate arithmetic instructions for the PR implementation. > > vadd_immI Node > > 16c addw R11, R10, zr #@convI2L_reg_reg > 170 add R9, R31, R11 # ptr, #@addP_reg_reg > 174 addi R9, R9, #16 # ptr, #@addP_reg_imm > 176 loadV V1, [R9] # vector (rvv) > 17e vadd_immI V1, V1, #7 > 186 add R11, R15, R11 # ptr, #@addP_reg_reg > 188 addi R11, R11, #16 # ptr, #@addP_reg_imm > 18a storeV [R11], V1 # vector (rvv) > > > vadd_immI_masked Node > > 1e8 B31: # out( B37 B32 ) <- in( B30 ) Freq: 76.2281 > 1e8 loadV V2, [R31] # vector (rvv) > 1f0 vloadmask V0, V1 > 1f8 vadd_immI_masked V2, V2, #7 > 200 addi R31, R10, #48 # ptr, #@addP_reg_imm > 204 bgeu R30, R7, B37 #@cmpU_branch P=0.000001 C=-1.000000 > > > vadd_regI Node > > 0c4 B4: # out( B9 B5 ) <- in( B8 B3 ) Freq: 1 > 0c4 vloadcon V1 # generate iota indices > 0cc spill [sp, #4] -> R30 # spill size = 32 > 0ce vmul_regI V1, V1, R30 > 0d6 spill [sp, #0] -> R29 # spill size = 32 > 0d8 vadd_regI V1, V1, R29 > > > vadd_regI_masked Node > > 244 B36: # out( B33 B37 ) <- in( B35 ) Freq: 7427.81 > 244 # castII of R30, #@castII > 244 addw R31, R30, zr #@convI2L_reg_reg > 248 spill [sp, #32] -> R10 # spill size = 64 > 24a add R10, R10, R31 # ptr, #@addP_reg_reg > 24c addi R10, R10, #16 # ptr, #@addP_reg_imm > 24e loadV V2, [R10] # vector (rvv) > 256 vloadmask V0, V1 > 25e vadd_regI_masked V2, V2, R29 > > > vsub_regI Node > > 112 B20: # out( B63 B21 ) <- in( B19 ) Freq: 77.0107 > 112 # castII of R20, #@castII > 112 addw R11, R2... Gui Cao has updated the pull request incrementally with one additional commit since the last revision: Code Format ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19415/files - new: https://git.openjdk.org/jdk/pull/19415/files/5f26d421..ac335baa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19415&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19415&range=00-01 Stats: 84 lines in 1 file changed: 0 ins; 0 del; 84 mod Patch: https://git.openjdk.org/jdk/pull/19415.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19415/head:pull/19415 PR: https://git.openjdk.org/jdk/pull/19415 From gcao at openjdk.org Tue May 28 10:49:26 2024 From: gcao at openjdk.org (Gui Cao) Date: Tue, 28 May 2024 10:49:26 GMT Subject: RFR: 8333006: RISC-V: C2: Support vector-scalar and vector-immediate arithmetic instructions [v2] In-Reply-To: <6EI4wpI9U7LpAW4QiOQOfpiuzHZjwiq77i1NUU-L1-g=.1241f023-2857-4259-9a8b-f8e7317f1e1c@github.com> References: <6EI4wpI9U7LpAW4QiOQOfpiuzHZjwiq77i1NUU-L1-g=.1241f023-2857-4259-9a8b-f8e7317f1e1c@github.com> Message-ID: On Tue, 28 May 2024 07:08:54 GMT, Fei Yang wrote: > Is it necessary to require that `src` and `dst` be the same register for un-predicated versions? Thanks for your review. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19415#discussion_r1617007815 From thartmann at openjdk.org Tue May 28 11:16:05 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 28 May 2024 11:16:05 GMT Subject: RFR: 8332499: Gtest codestrings.validate_vm fail on linux x64 [v4] In-Reply-To: References: Message-ID: On Tue, 28 May 2024 02:59:15 GMT, SendaoYan wrote: >> Hi all, >> There's some arch-specific code to trim trailing entries as descripted in [JDK-8332499](https://bugs.openjdk.org/browse/JDK-8332499). Only change the gtest testcase, the risk is low. >> >> Additional test: >> - [x] codestrings.validate_vm on linux x64 >> - [x] codestrings.validate_vm on linux aarch64 >> - [x] codestrings.validate_vm on linux riscv64 > > SendaoYan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'openjdk:master' into jbs8332499 > - 8332499: Gtest codestrings.validate_vm fail on linux x64 > > Signed-off-by: sendaoYan > - 8332499: Gtest codestrings.validate_vm fail on linux x64 > > Signed-off-by: sendaoYan Could you please explain why the test fails and how your fix addresses the issue? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19309#issuecomment-2134957987 From mbaesken at openjdk.org Tue May 28 12:42:24 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 28 May 2024 12:42:24 GMT Subject: RFR: 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer Message-ID: When running on macOS with ubsan enabled, we see some issues in relocInfo (hpp and cpp); those already occur in the build quite early. /jdk/src/hotspot/share/code/relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer Similar happens when we add to the _current pointer _current++; this gives : relocInfo.hpp:606:13: runtime error: applying non-zero offset to non-null pointer 0xfffffffffffffffe produced null pointer Seems the pointer subtraction/addition worked so far, so it might be an option to disable ubsan for those 2 functions. ------------- Commit messages: - JDK-8331731 Changes: https://git.openjdk.org/jdk/pull/19424/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19424&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331731 Stats: 6 lines in 2 files changed: 5 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19424.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19424/head:pull/19424 PR: https://git.openjdk.org/jdk/pull/19424 From mdoerr at openjdk.org Tue May 28 13:40:02 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 28 May 2024 13:40:02 GMT Subject: RFR: 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer In-Reply-To: References: Message-ID: <4AM8l_bNncziXvnEVL7ExnT9cuRE4UcCsTXf4NuE0dw=.1c22a317-6b92-4885-b352-b14ce84dc4ee@github.com> On Tue, 28 May 2024 12:36:40 GMT, Matthias Baesken wrote: > When running on macOS with ubsan enabled, we see some issues in relocInfo (hpp and cpp); those already occur in the build quite early. > > /jdk/src/hotspot/share/code/relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer > > Similar happens when we add to the _current pointer > _current++; > this gives : > relocInfo.hpp:606:13: runtime error: applying non-zero offset to non-null pointer 0xfffffffffffffffe produced null pointer > > Seems the pointer subtraction/addition worked so far, so it might be an option to disable ubsan for those 2 functions. An idea to fix it is to avoid pointer arithmetic. E.g. diff --git a/src/hotspot/share/code/relocInfo.cpp b/src/hotspot/share/code/relocInfo.cpp index d0f732edac4..b6e1517aefd 100644 --- a/src/hotspot/share/code/relocInfo.cpp +++ b/src/hotspot/share/code/relocInfo.cpp @@ -152,7 +152,7 @@ RelocIterator::RelocIterator(CodeSection* cs, address begin, address limit) { initialize_misc(); assert(((cs->locs_start() != nullptr) && (cs->locs_end() != nullptr)) || ((cs->locs_start() == nullptr) && (cs->locs_end() == nullptr)), "valid start and end pointer"); - _current = cs->locs_start()-1; + _current = (relocInfo*)((uintptr_t)cs->locs_start() - sizeof(relocInfo)); _end = cs->locs_end(); _addr = cs->start(); _code = nullptr; // Not cb->blob(); diff --git a/src/hotspot/share/code/relocInfo.hpp b/src/hotspot/share/code/relocInfo.hpp index 6d0907d97de..1774c8ac62a 100644 --- a/src/hotspot/share/code/relocInfo.hpp +++ b/src/hotspot/share/code/relocInfo.hpp @@ -603,7 +603,7 @@ class RelocIterator : public StackObj { // get next reloc info, return !eos bool next() { - _current++; + _current = (relocInfo*)((uintptr_t)_current + sizeof(relocInfo)); assert(_current <= _end, "must not overrun relocInfo"); if (_current == _end) { set_has_current(false); Doesn't look very nice, but should work. ------------- PR Review: https://git.openjdk.org/jdk/pull/19424#pullrequestreview-2082821428 From mbaesken at openjdk.org Tue May 28 14:20:01 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 28 May 2024 14:20:01 GMT Subject: RFR: 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer In-Reply-To: <4AM8l_bNncziXvnEVL7ExnT9cuRE4UcCsTXf4NuE0dw=.1c22a317-6b92-4885-b352-b14ce84dc4ee@github.com> References: <4AM8l_bNncziXvnEVL7ExnT9cuRE4UcCsTXf4NuE0dw=.1c22a317-6b92-4885-b352-b14ce84dc4ee@github.com> Message-ID: On Tue, 28 May 2024 13:37:30 GMT, Martin Doerr wrote: > Doesn't look very nice, but should work. I agree, it does not look very nice. Not sure what is better, disabling ubsan for the methods or use the code you suggested. Or maybe add some helper template/macro for pointer additions that covers those cases and handles nullptr nicely ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19424#issuecomment-2135332511 From mdoerr at openjdk.org Tue May 28 14:59:14 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 28 May 2024 14:59:14 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v3] In-Reply-To: References: Message-ID: <0YQF8jE_JFiy_K34aIy6cybUwnpp47-6jrnmZ3jbcAI=.c6663758-17f6-40f8-a738-4e4bf7e9ddaf@github.com> > PPC64 implementation of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). Please review! > I noticed that `r_array_length` is sometimes 0 and I don't see code for that on x86. Any idea? > How can we verify it? By comparing the performance using the micro benchmarks? > > Micro benchmark results without patch (measured on Power10 with 2*8 hardware threads): > > Original > SecondarySuperCacheHits: 13.033 ?(99.9%) 0.058 ns/op [Average] > SecondarySuperCacheInterContention.test avgt 15 432.366 ? 8.364 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 432.310 ? 8.460 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 432.422 ? 10.819 ns/op > SecondarySuperCacheIntraContention.test avgt 15 355.192 ? 3.597 ns/op > SecondarySupersLookup.testNegative00 avgt 15 12.274 ? 0.026 ns/op > SecondarySupersLookup.testNegative01 avgt 15 12.300 ? 0.039 ns/op > SecondarySupersLookup.testNegative02 avgt 15 12.304 ? 0.034 ns/op > SecondarySupersLookup.testNegative03 avgt 15 12.276 ? 0.050 ns/op > SecondarySupersLookup.testNegative04 avgt 15 12.235 ? 0.044 ns/op > SecondarySupersLookup.testNegative05 avgt 15 12.308 ? 0.156 ns/op > SecondarySupersLookup.testNegative06 avgt 15 12.291 ? 0.048 ns/op > SecondarySupersLookup.testNegative07 avgt 15 12.307 ? 0.052 ns/op > SecondarySupersLookup.testNegative08 avgt 15 12.398 ? 0.075 ns/op > SecondarySupersLookup.testNegative09 avgt 15 12.552 ? 0.122 ns/op > SecondarySupersLookup.testNegative10 avgt 15 12.490 ? 0.083 ns/op > SecondarySupersLookup.testNegative16 avgt 15 12.565 ? 0.092 ns/op > SecondarySupersLookup.testNegative20 avgt 15 19.059 ? 0.958 ns/op > SecondarySupersLookup.testNegative30 avgt 15 19.268 ? 0.124 ns/op > SecondarySupersLookup.testNegative32 avgt 15 20.059 ? 0.114 ns/op > SecondarySupersLookup.testNegative40 avgt 15 25.117 ? 0.368 ns/op > SecondarySupersLookup.testNegative50 avgt 15 32.735 ? 0.359 ns/op > SecondarySupersLookup.testNegative55 avgt 15 34.866 ? 0.152 ns/op > SecondarySupersLookup.testNegative56 avgt 15 35.492 ? 0.276 ns/op > SecondarySupersLookup.testNegative57 avgt 15 36.620 ? 0.334 ns/op > SecondarySupersLookup.testNegative58 avgt 15 37.226 ? 0.180 ns/op > SecondarySupersLookup.testNegative59 avgt 15 37.774 ? 0.241 ns/op > SecondarySupersLookup.testNegative60 avgt 15 38.627 ? 1.451 ns/op > SecondarySupersLookup.testNegative61 avgt 15 39.395 ? 0.249 ns/op > SecondarySupersLookup.testNegative62 avgt 15 ... Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Adapt assertion. We sometimes have only 1 element in the secondary supers array. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19368/files - new: https://git.openjdk.org/jdk/pull/19368/files/6753375e..c1840719 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19368&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19368&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19368.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19368/head:pull/19368 PR: https://git.openjdk.org/jdk/pull/19368 From chagedorn at openjdk.org Tue May 28 15:01:02 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 28 May 2024 15:01:02 GMT Subject: RFR: 8332228: TypePollution.java: Unrecognized VM option 'UseSecondarySuperCache' In-Reply-To: References: Message-ID: On Tue, 28 May 2024 14:01:44 GMT, Martin Doerr wrote: > Fix obvious typo in micro benchmark. Looks good and trivial! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19427#pullrequestreview-2083086912 From mdoerr at openjdk.org Tue May 28 15:01:02 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 28 May 2024 15:01:02 GMT Subject: RFR: 8332228: TypePollution.java: Unrecognized VM option 'UseSecondarySuperCache' In-Reply-To: References: Message-ID: <31R6Qxgzx0ykcomNZ0i3M7wlRPjUknBQMEUII71yo98=.6853a244-c87d-4c86-8522-005c85a4d372@github.com> On Tue, 28 May 2024 14:01:44 GMT, Martin Doerr wrote: > Fix obvious typo in micro benchmark. Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19427#issuecomment-2135446758 From aph at openjdk.org Tue May 28 15:17:17 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 28 May 2024 15:17:17 GMT Subject: RFR: 8331658: secondary_super_cache does not scale well: C1 Message-ID: This is the C1 version of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). The new logic in this PR is as simple as I can make it. It is a somewhat-simplified version of the C2 change in [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). In order to reduce risk I haven't touched the existing slow subtype stub. The register allocation logic in the existing code is pretty gnarly, and I have no desire to break anything at this point in the release cycle, so I have allocated just one register more than te existing code. Performance is pretty good. Before and after: x64, AMD 2950X, 8 cores: Benchmark Mode Cnt Score Error Units SecondarySuperCacheHits.test avgt 5 0.959 ? 0.091 ns/op SecondarySuperCacheInterContention.test avgt 5 42.931 ? 6.951 ns/op SecondarySuperCacheInterContention.test:t1 avgt 5 42.397 ? 7.708 ns/op SecondarySuperCacheInterContention.test:t2 avgt 5 43.466 ? 8.238 ns/op SecondarySuperCacheIntraContention.test avgt 5 74.660 ? 0.127 ns/op SecondarySuperCacheHits.test avgt 5 1.480 ? 0.077 ns/op SecondarySuperCacheInterContention.test avgt 5 1.461 ? 0.063 ns/op SecondarySuperCacheInterContention.test:t1 avgt 5 1.767 ? 0.078 ns/op SecondarySuperCacheInterContention.test:t2 avgt 5 1.155 ? 0.052 ns/op SecondarySuperCacheIntraContention.test avgt 5 1.421 ? 0.002 ns/op AArch64, Mac M3, 8 cores: Benchmark Mode Cnt Score Error Units SecondarySuperCacheHits.test avgt 5 0.835 ? 0.021 ns/op SecondarySuperCacheInterContention.test avgt 5 74.078 ? 18.095 ns/op SecondarySuperCacheInterContention.test:t1 avgt 5 81.863 ? 42.492 ns/op SecondarySuperCacheInterContention.test:t2 avgt 5 66.293 ? 11.254 ns/op SecondarySuperCacheIntraContention.test avgt 5 335.563 ? 6.171 ns/op SecondarySuperCacheHits.test avgt 5 1.212 ? 0.004 ns/op SecondarySuperCacheInterContention.test avgt 5 0.871 ? 0.002 ns/op SecondarySuperCacheInterContention.test:t1 avgt 5 0.626 ? 0.003 ns/op SecondarySuperCacheInterContention.test:t2 avgt 5 1.115 ? 0.006 ns/op SecondarySuperCacheIntraContention.test avgt 5 0.696 ? 0.001 ns/op The first test, `SecondarySuperCacheHits`, showns a small regression. It's the "happy path" which simply checks the same subclass again and again in a loop, in a single thread. I suspect that, as with the C2 experiments we did, this will never be noticeable. All the other tests, though, show a huge improvement, so performance is a lot more predictable. This patch only affects `checkcast` and `instanceof`. The performance of `Class::isInstance()` isn't affected because it's not intrinsified in C1, and neither is any of the logic for arraycopy intrinsics. After the next release is done, I'd like to do a big cleanup and simplification of subtype checking, which should include the still-missing parts of C1 and the interpreter and make everything much more maintainable. Finally, this patch doesn't greatly help with tiered compilation because the subtype checking runtime is greatly affected by profile counter updates. It's really all about pure C1, which seems to be popular in some short-lived cloud applications. ------------- Commit messages: - JDK-8331341: secondary_super_cache does not scale well: C1 and interpreter - JDK-8331341: secondary_super_cache does not scale well: C1 and interpreter - Merge branch 'clean' into C1-hash-supers - JDK-8331341: secondary_super_cache does not scale well: C1 and interpreter - JDK-8331341: secondary_super_cache does not scale well: C1 and interpreter - JDK-8331341: secondary_super_cache does not scale well: C1 and interpreter - JDK-8331341: secondary_super_cache does not scale well: C1 and interpreter - Test - Merge branch 'C1-hash-supers' of https://github.com/theRealAph/jdk into C1-hash-supers - Merge branch 'C1-hash-supers' of https://github.com/theRealAph/jdk into C1-hash-supers - ... and 95 more: https://git.openjdk.org/jdk/compare/235ba9a7...8c05732c Changes: https://git.openjdk.org/jdk/pull/19426/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19426&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331658 Stats: 232 lines in 9 files changed: 205 ins; 3 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/19426.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19426/head:pull/19426 PR: https://git.openjdk.org/jdk/pull/19426 From syan at openjdk.org Tue May 28 15:47:25 2024 From: syan at openjdk.org (SendaoYan) Date: Tue, 28 May 2024 15:47:25 GMT Subject: RFR: 8332499: Gtest codestrings.validate_vm fail on linux x64 when hsdis is present [v5] In-Reply-To: References: Message-ID: > Hi all, > There's some arch-specific code to trim trailing entries as descripted in [JDK-8332499](https://bugs.openjdk.org/browse/JDK-8332499). Only change the gtest testcase, the risk is low. > > On linux x86_64, before this PR, after deal with `std::regex_replace(tmp4, std::regex("\\s+:\\s+hlt[ \\t]+(?!\\n\\s+;;)"), "")`, the output differents because the first output has trailing empty space, show as below: > > - : nop > + : nop > > So we need to delete the empty spaces after `: nop` use `std::regex_replace(tmp5, std::regex("(\\s+:\\s+nop)[ \\t]*"), "$1")` > > > Additional test: > - [x] codestrings.validate_vm on linux x64 > - [x] codestrings.validate_vm on linux aarch64 > - [x] codestrings.validate_vm on linux riscv64 SendaoYan has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'jbs8332499' of github.com:sendaoYan/jdk-ysd into jbs8332499 - delete the empty spaces after : nop ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19309/files - new: https://git.openjdk.org/jdk/pull/19309/files/1c017d20..1f1ae322 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19309&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19309&range=03-04 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19309.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19309/head:pull/19309 PR: https://git.openjdk.org/jdk/pull/19309 From syan at openjdk.org Tue May 28 15:47:25 2024 From: syan at openjdk.org (SendaoYan) Date: Tue, 28 May 2024 15:47:25 GMT Subject: RFR: 8332499: Gtest codestrings.validate_vm fail on linux x64 when hsdis is present [v4] In-Reply-To: References: Message-ID: On Tue, 28 May 2024 11:13:22 GMT, Tobias Hartmann wrote: > Could you please explain why the test fails and how your fix addresses the issue? On linux x86_64, before this PR, after deal with `std::regex_replace(tmp4, std::regex("\\s+:\\s+hlt[ \\t]+(?!\\n\\s+;;)"), "")`, the output differents because the first output has trailing empty space, show as below: - : nop + : nop So we need to delete the empty spaces after `: nop` use `std::regex_replace(tmp5, std::regex("(\\s+:\\s+nop)[ \\t]*"), "$1")` ------------- PR Comment: https://git.openjdk.org/jdk/pull/19309#issuecomment-2135576744 From kvn at openjdk.org Tue May 28 15:55:05 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 28 May 2024 15:55:05 GMT Subject: RFR: 8332228: TypePollution.java: Unrecognized VM option 'UseSecondarySuperCache' In-Reply-To: References: Message-ID: On Tue, 28 May 2024 14:01:44 GMT, Martin Doerr wrote: > Fix obvious typo in micro benchmark. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19427#pullrequestreview-2083281226 From aph at openjdk.org Tue May 28 15:57:19 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 28 May 2024 15:57:19 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v11] In-Reply-To: References: Message-ID: > At the present time, `assert_different_registers()` uses an O(N**2) algorithm in assert_different_registers(). We can utilize RegSet to do it in O(N) time. This would be a useful optimization for all builds with assertions enabled. > > In addition, it would be useful to be able to static_assert different registers. > > Also, I've taken the opportunity to expand the maximum size of a RegSet to 64 on 64-bit platforms. > > I also fixed a bug: sometimes `noreg` is passed to `assert_different_registers()`, but it may only be passed once or a spurious assertion is triggered. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/asm/register.hpp Co-authored-by: Stefan Karlsson ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16617/files - new: https://git.openjdk.org/jdk/pull/16617/files/693df766..951277be Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16617&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16617&range=09-10 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16617.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16617/head:pull/16617 PR: https://git.openjdk.org/jdk/pull/16617 From aph at openjdk.org Tue May 28 15:57:19 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 28 May 2024 15:57:19 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v3] In-Reply-To: References: Message-ID: <9LKGlKPZBypP4LljXCMVFVUFVYhh2UfoRw_C3iyNUA0=.a6b85eb4-e99b-4047-ba40-d5f92898e6fe@github.com> On Mon, 27 Nov 2023 10:43:30 GMT, Andrew Haley wrote: > > > I started to review the patch and was wondering if this could be simplify to something like this?: [stefank at f38c791](https://github.com/stefank/jdk/commit/f38c791793440b899ce6c4c9723470a5d4b18050) > > > > > > Sure, it could be done. This is a minor efficiency tweak. > > It tested the build performance before this PR, with the patch in this PR, and my simplified version. I can't see any performance difference on my MacBook M1. Is there any platform where this makes a bigger difference? > > Edit: I realize that since this doesn't always boil down to a constexpr, then the run time might be more interesting than the build time. You won't see very much, if any, because other things dominate. The main advantage, going forward, is that much of this can be constexpr'd, once I find out how to test on Windows. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16617#issuecomment-2135594254 From aph at openjdk.org Tue May 28 15:57:19 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 28 May 2024 15:57:19 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v10] In-Reply-To: <-RRlrDdRqiN1sxsQF7RYJIl8W6Z62LcAq8quEalrzjc=.f6ae63e5-92d9-41be-962b-e2741c676b32@github.com> References: <-RRlrDdRqiN1sxsQF7RYJIl8W6Z62LcAq8quEalrzjc=.f6ae63e5-92d9-41be-962b-e2741c676b32@github.com> Message-ID: On Mon, 13 May 2024 14:36:24 GMT, Stefan Karlsson wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Review feedback > > src/hotspot/share/asm/register.hpp line 263: > >> 261: template >> 262: inline constexpr bool different_registers(AbstractRegSet allocated_regs, R first_register, Rx... more_registers) { >> 263: if (allocated_regs.contains(first_register)) { > > FWIW, while first reading this I was looking for the base case of the recursion (the previous versions had some extra specializations). To me it looks like the base case is written in both this function and the function above. I would prefer to have the implementation inside one function only and change this function to use: > > if (!different_registers(allocated_regs, first_register)) { > > I think this could make it a bit clearer, but if you prefer the current style, I think that's fine as well. I'd prefer to stick with what I have, because it's a bit more direct and slightly simpler runtime code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16617#discussion_r1617545769 From sgibbons at openjdk.org Tue May 28 16:03:16 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 16:03:16 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v38] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 20:12:07 GMT, Vladimir Kozlov wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Test clarifications > > test/jdk/java/lang/StringBuffer/IndexOf.java line 28: > >> 26: * @summary Test indexOf and lastIndexOf >> 27: * @run main/othervm IndexOf >> 28: * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions -Xcomp -XX:-TieredCompilation -XX:UseAVX=2 -XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts IndexOf > > I suggest to split it into 2 subtest jobs and use `@requires vm.cpu.features ~= ".*avx2.*"` for second which specified `-XX:UseAVX=2`. > See `compiler/loopopts/superword/TestDependencyOffsets.java` for example. @vnkozlov I'm getting an error in CI tests with this line added. Can you please advise? `TEST RESULT: Error. Parse Exception: Syntax error in @requires expression: invalid name: vm.cpu.features` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617556335 From sgibbons at openjdk.org Tue May 28 16:06:16 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 16:06:16 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v41] In-Reply-To: References: Message-ID: On Sat, 25 May 2024 06:33:51 GMT, Alan Bateman wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix test; review comments > > test/jdk/java/lang/StringBuffer/IndexOf.java line 47: > >> 45: public class IndexOf { >> 46: >> 47: static Random generator = new Random(); > > @RogerRiggs Would you have cycles to look at Scott's changes to this test? I suspect it will need to be re-structured, re-formatted, and commented to get into maintainable shape. I am going to revert my changes to this file as the test `jdk/java/lang/String/IndexOf.java` covers the code better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617560447 From kvn at openjdk.org Tue May 28 16:12:04 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 28 May 2024 16:12:04 GMT Subject: RFR: 8325083: jdk/incubator/vector/Double512VectorTests.java crashes in Assembler::vex_prefix_and_encode [v2] In-Reply-To: References: Message-ID: On Tue, 28 May 2024 05:24:58 GMT, Jatin Bhateja wrote: >> @jatin-bhateja can you explain in more details what KNL is missing to trigger the assert? Can we predicate on missing feature here instead of KNL && DBL checks? >> My concern is KNL check could be not enough if such feature is disabled in some container environment which does not match KNL settings. >> >> Why we have `assert(UseAVX > 0` here? `Assembler::vpmov*()` instructions have corresponding asserts already. No need to fix it here but I think we need to cleanup `*.ad` files from such duplicated asserts as separate RFE. > >> @jatin-bhateja can you explain in more details what KNL is missing to trigger the assert? Can we predicate on missing feature here instead of KNL && DBL checks? My concern is KNL check could be not enough if such feature is disabled in some container environment which does not match KNL settings. >> >> Why we have `assert(UseAVX > 0` here? `Assembler::vpmov*()` instructions have corresponding asserts already. No need to fix it here but I think we need to cleanup `*.ad` files from such duplicated asserts as separate RFE. > > Hi @vnkozlov , Problem occurs while emitting VMOVSXBD instruction, please refer to following LOC > https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L2936 > > This is a case for byte to double conversion and we break it into two instructions, first for byte to doubleword casting followed by doubleword to double precision casting, while we could have used a combination of VMOVSXBQ + VCVTQQ2PD but it would further sharpen target constrains since quadword to double precision casting needs AVX512DQ feature, thus current scheme works well and by limiting operand allocations to legacy register set we can safely issue 256 bit VMOVSXBD instruction on KNL target which lacks AVX512VL feature. @jatin-bhateja Thank you for explaining the issue. I am fine with splitting instruction to use legacy registers. My only concern is use of `!VM_Version::is_knights_family()` check. Can we use `VM_Version::supports_avx512vl()` instead? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19407#issuecomment-2135626098 From sgibbons at openjdk.org Tue May 28 16:12:43 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 16:12:43 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v44] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Revert changes to IndexOf.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/15994a39..01cb58fb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=43 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=42-43 Stats: 382 lines in 1 file changed: 0 ins; 222 del; 160 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From epeter at openjdk.org Tue May 28 16:18:03 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 May 2024 16:18:03 GMT Subject: RFR: 8332856: C2: Add new transform for bool eq/ne (cmp (and (urshift X const1) const2) 0) In-Reply-To: References: Message-ID: <-JYZ6Z4MxHJ_uMGeskp_Nzst7MwME_g4gH0emWX27GI=.d333614c-efc1-4760-93ab-818731fc7c47@github.com> On Mon, 20 May 2024 14:15:46 GMT, Tobias Hotz wrote: > This PR adds a new ideal optimization for the following pattern: > > public boolean testFunc(int a) { > int mask = 0b101; > int shift = 12; > return ((a >> shift) & mask) == 0; > } > > Where the mask and shift are constant values and a is a variable. For this optimization to work, the right shift has to be idealized to a unsinged right shift earlier in the pipeline, which here: https://github.com/openjdk/jdk/blob/b92bd671835c37cff58e2cdcecd0fe4277557d7f/src/hotspot/share/opto/mulnode.cpp#L731 > If the shift is already an unsiged bit shift, it works as well. > On AMD64 CPUs, this means that this whole line computation can be reduced to a simple `test` instruction. Hi @ichttt ! Do you have a benchmark to justify this pattern? Where would this pattern occur? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19310#issuecomment-2135639199 From epeter at openjdk.org Tue May 28 16:23:06 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 May 2024 16:23:06 GMT Subject: RFR: 8332856: C2: Add new transform for bool eq/ne (cmp (and (urshift X const1) const2) 0) In-Reply-To: References: Message-ID: On Mon, 20 May 2024 14:15:46 GMT, Tobias Hotz wrote: > This PR adds a new ideal optimization for the following pattern: > > public boolean testFunc(int a) { > int mask = 0b101; > int shift = 12; > return ((a >> shift) & mask) == 0; > } > > Where the mask and shift are constant values and a is a variable. For this optimization to work, the right shift has to be idealized to a unsinged right shift earlier in the pipeline, which here: https://github.com/openjdk/jdk/blob/b92bd671835c37cff58e2cdcecd0fe4277557d7f/src/hotspot/share/opto/mulnode.cpp#L731 > If the shift is already an unsiged bit shift, it works as well. > On AMD64 CPUs, this means that this whole line computation can be reduced to a simple `test` instruction. If I compare the two: bool eq/ne (cmp (and (urshift X 4) 1) 0) bool ne/eq (cmp (and X 8) 0) Then I see that the outer part is the same for both: `bool ne/eq (cmp (...) 0)` So why would this not just be a optimization for the pattern `AndI(Shift X shift, mask)` -> `AndI(X, new_mask)`? That would be more general. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19310#issuecomment-2135656727 From epeter at openjdk.org Tue May 28 16:29:06 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 May 2024 16:29:06 GMT Subject: RFR: 8332856: C2: Add new transform for bool eq/ne (cmp (and (urshift X const1) const2) 0) In-Reply-To: References: Message-ID: On Mon, 20 May 2024 14:15:46 GMT, Tobias Hotz wrote: > This PR adds a new ideal optimization for the following pattern: > > public boolean testFunc(int a) { > int mask = 0b101; > int shift = 12; > return ((a >> shift) & mask) == 0; > } > > Where the mask and shift are constant values and a is a variable. For this optimization to work, the right shift has to be idealized to a unsinged right shift earlier in the pipeline, which here: https://github.com/openjdk/jdk/blob/b92bd671835c37cff58e2cdcecd0fe4277557d7f/src/hotspot/share/opto/mulnode.cpp#L731 > If the shift is already an unsiged bit shift, it works as well. > On AMD64 CPUs, this means that this whole line computation can be reduced to a simple `test` instruction. I'm looking at `AndINode::Ideal`. `AndIL_add_shift_and_mask` seems to do something similar, but not exactly this. Maybe it could be extended. Ha, what about this? // Masking off sign bits? Dont make them! if( lop == Op_RShiftI ) { const TypeInt *t12 = phase->type(load->in(2))->isa_int(); if( t12 && t12->is_con() ) { // Shift is by a constant int shift = t12->get_con(); shift &= BitsPerJavaInteger-1; // semantics of Java shifts const int sign_bits_mask = ~right_n_bits(BitsPerJavaInteger - shift); // If the AND'ing of the 2 masks has no bits, then only original shifted // bits survive. NO sign-extension bits survive the maskings. if( (sign_bits_mask & mask) == 0 ) { // Use zero-fill shift instead Node *zshift = phase->transform(new URShiftINode(load->in(1),load->in(2))); return new AndINode( zshift, in(2) ); } } } ------------- PR Comment: https://git.openjdk.org/jdk/pull/19310#issuecomment-2135667137 From epeter at openjdk.org Tue May 28 16:41:01 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 May 2024 16:41:01 GMT Subject: RFR: 8324756: Test vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize is too slow due to dependency verification [v5] In-Reply-To: <2ygMhqSjsiuHeguO3lMC4FOUCVI28tci2j0-8j3k7F4=.bd620849-4738-491b-853f-a9d2bdbe2067@github.com> References: <2ygMhqSjsiuHeguO3lMC4FOUCVI28tci2j0-8j3k7F4=.bd620849-4738-491b-853f-a9d2bdbe2067@github.com> Message-ID: On Wed, 15 May 2024 09:26:27 GMT, Ian Myers wrote: >> This change removes dependency verification by passing -XX:-VerifyDependencies in the test. >> >> `vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java` takes 20min to run on linux-x86_64-server-fastdebug: >> >> time CONF=linux-x86_64-server-fastdebug make test TEST=vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java >> CONF=linux-x86_64-server-fastdebug make test **1412.82s user 15.27s system 115% cpu 20:41.19 total** >> >> >> Passing -XX:-VerifyDependencies flag speeds up the run time to 1min: >> >> time CONF=linux-x86_64-server-fastdebug make test TEST=vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java TEST_VM_OPTS="-XX:-VerifyDependencies" >> CONF=linux-x86_64-server-fastdebug make test **287.27s user 16.19s system 496% cpu 1:01.10 total** >> >> >> Adding -XX:-VerifyDependencies to the test file accomplishes the same run time of 1min: >> >> time CONF=linux-x86_64-server-fastdebug make test TEST=vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java >> CONF=linux-x86_64-server-fastdebug make test **272.33s user 14.56s system 464% cpu 1:01.75 total** > > Ian Myers has updated the pull request incrementally with one additional commit since the last revision: > > [8324756] Remove dependency verification from vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java Why is `VerifyDependencies` so slow in this test? test/hotspot/jtreg/vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java line 168: > 166: MHTransformationGen.createAndCallSequence(retVal, dataSnapshot, _mh, _finalArgs, true); > 167: } > 168: Suggestion: Drive by suggestion to get the `rfr` label back ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19040#issuecomment-2135689070 PR Review Comment: https://git.openjdk.org/jdk/pull/19040#discussion_r1617603397 From sviswanathan at openjdk.org Tue May 28 16:42:19 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 28 May 2024 16:42:19 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v43] In-Reply-To: References: Message-ID: <8iJLAJvIzHgXpr5P1hWOLTj-bfO6THNUJkQf7Ki2P9Y=.43680b7c-c3e1-41f8-b065-7955e6237613@github.com> On Sat, 25 May 2024 22:19:41 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Fix tests src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 239: > 237: // the needle size is less than 32 bytes, we default to a > 238: // byte-by-byte comparison (this will be rare). > 239: // Is this still true? src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 278: > 276: __ bind(L_nextCheck); > 277: __ testq(haystack_len_p, haystack_len_p); > 278: __ je(L_zeroCheckFailed); This check could be removed as the next check covers this one. src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 360: > 358: __ push(rcx); > 359: __ push(r8); > 360: __ push(r9); No need to save/restore rcx/r8/r9 on windows platform as well. src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 379: > 377: > 378: // Assume failure > 379: __ movq(rbp, -1); We are no more using rbp at return point so this is not needed now? src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 488: > 486: __ cmpq(r11, nMinusK); > 487: __ ja_b(L_return); > 488: __ movq(rax, r11); At places where we know that return value in r11 is correct, we dont need to checkRange so this could have its own label. src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 566: > 564: // rbp: -1 > 565: // XMM_BYTE_0 - first element of needle broadcast > 566: // XMM_BYTE_K - last element of needle broadcast The only registers that are used as input in the switch case are: r14 = needle rbx = haystack rsi = haystack length (n) r12 = needle length (k) r10 = n - k (where k is needle length) XMM_BYTE_0 = first element of needle, broadcast XMM_BYTE_K = last element of needle, broadcast So we could only list these, making it easier to comprehend. src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 578: > 576: // helper jumps to L_checkRangeAndReturn with a (-1) return value. > 577: big_case_loop_helper(false, 0, L_checkRangeAndReturn, L_loopTop, mask, hsPtrRet, needleLen, > 578: needle, haystack, hsLength, tmp1, tmp2, tmp3, rScratch, ae, _masm); If we run out of haystack instead of jumping to L_checkRangeAndReturn, we could directly jump to L_retrunError. src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 597: > 595: > 596: // Need a lot of registers here to preserve state across arrays_equals call > 597: This comment is no longer valid, could be removed. src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 621: > 619: __ addq(hsPtrRet, index); > 620: __ movq(r11, hsPtrRet); > 621: __ jmp(L_checkRangeAndReturn); Why do we have to checkRange here, would it not be always correct? It so we could return r11 directly (by moving into rax). src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 660: > 658: // Haystack always copied to stack, so 32-byte reads OK > 659: // Haystack length < 32 > 660: // 10 < needle length < 32 Haystack length <= 32 10 < needle length <= 32 src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 721: > 719: false /* char */, knoreg); > 720: __ testl(rTmp3, rTmp3); > 721: __ jne(L_checkRangeAndReturn); Why do we have to checkRange here, would it not be always correct? It so we could return r11 directly (by moving into rax). src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1333: > 1331: > 1332: __ cmpq(nMinusK, 32); > 1333: __ jae_b(L_greaterThan32); Should this check be (n-k+1) >= 32? And so accordingly (n-k) >= 31 __ cmpq(nMinusK, 31); __ jae_b(L_greaterThan32); src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1382: > 1380: > 1381: __ testl(eq_mask, eq_mask); > 1382: __ je(noMatch); We are mixing operation width l and q here. src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1750: > 1748: // r15 = unused > 1749: // XMM_BYTE_0 - first element of needle, broadcast > 1750: // XMM_BYTE_K - last element of needle, broadcast This comment is duplicated for both small haystack case and big haystack case, could be made a common comment. Also the only registers that are used as input in the switch case are: r14 = needle rbx = haystack rsi = haystack length (n) r12 = needle length (k) r10 = n - k (where k is needle length) XMM_BYTE_0 = first element of needle, broadcast XMM_BYTE_K = last element of needle, broadcast So we could only list these, making it easier to comprehend. src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1758: > 1756: // > 1757: // If a match is found, jump to L_checkRange, which ensures the > 1758: // matched needle is not past the end of the haystack. Another comment here would be useful: // The found index is returned in set_bit (r11). src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1810: > 1808: // XMM_BYTE_K - last element of needle, broadcast > 1809: // > 1810: // The haystack is > 32 bytes Good to mention some info about the return found index value in comment about how it is a combination of set_bit (r8), hs_ptr, and haystack. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617187600 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617193503 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617216424 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617218826 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617603927 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617318645 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617307443 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617536831 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617569308 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617575018 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617601913 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1616424912 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1616427773 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617263035 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617267415 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617273352 From kvn at openjdk.org Tue May 28 16:49:01 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 28 May 2024 16:49:01 GMT Subject: RFR: 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer In-Reply-To: References: <4AM8l_bNncziXvnEVL7ExnT9cuRE4UcCsTXf4NuE0dw=.1c22a317-6b92-4885-b352-b14ce84dc4ee@github.com> Message-ID: <3rxG1K1WWdhmQ0qu7og3FBbhwIP-ZRgZm-ZcmxOrUjU=.efd8f459-c2b1-4315-85c7-88ecaa9461c6@github.com> On Tue, 28 May 2024 14:16:59 GMT, Matthias Baesken wrote: > > Doesn't look very nice, but should work. > > I agree, it does not look very nice. Not sure what is better, disabling ubsan for the methods or use the code you suggested. Or maybe add some helper template/macro for pointer additions that covers those cases and handles nullptr nicely ? I prefer @TheRealMDoerr suggestion vs disabling ubsan check for this code. It should be compiled to the same assembler. I would only add comment to explain why we don't do simple pointer arithmetic here. `cs->locs_start()` == `nullptr` is common case and I don't want to complicate code with additional `nullptr` checks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19424#issuecomment-2135703084 From epeter at openjdk.org Tue May 28 16:50:49 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 May 2024 16:50:49 GMT Subject: RFR: 8325155: C2 SuperWord: remove alignment boundaries [v3] In-Reply-To: References: Message-ID: > I have tried for a very long time to get rid of all the `alignment(n)` code that is all over the SuperWord code. With lots of previous work, I am now finally ready to remove it. > > I was able to remove lots of VM code, about 300 lines. And the removed code is I think much more complicated than the new code. > > This is what I did in this PR: > - Removal of `_node_info`: used to have many fields, which I refactored out to the `VLoopAnalyzer` modules. `alignment` is the last component, which I now remove. > - Changed the implementation of `SuperWord::find_adjacent_refs`, now `SuperWord::find_adjacent_memop_pairs`, completely: > - It used to be an algorithm that would scan over all `memops` repeatedly, try to find some `mem_ref` and see which other memops were comparable, and then pack pairs for all of those, by comparing all-vs-all memops. This algorithm is at least quadratic, if not much worse. > - I now add all `memops` into a single array, sort them by groups (those that are comparable with each other and could be packed into vectors), and inside the groups by ascending offset. This allows me to split off the groups much more efficiently, and also the sorting by offset allows me finding adjacent pairs much more efficiently. In the most cases this reduces the cost to `O(n log n)` for sort, and a linear scan for finding adjacent memops. > - I removed the "alignment boundaries" created in `SuperWord::memory_alignment` by `int off_rem = offset % vw;`. > - This used to have the effect that all offsets were computed modulo the vector width. Hence, pairs could not be packed across this boundary (e.g. we have nodes with offsets `31, 32`, which are adjacent in theory, but if we have a `vw = 32`, then the modulo-offsets are `31, 0`, and they are not detected as adjacent). > - These "alignment boundaries" used to be required for correctness about a year ago, before I fixed and relaxed much of the alignment code. > - The `alignment` used to have another important task: Ensuring compatibility of the input-size of a use node, with the output-size of the def-node. > - This was done by giving all nodes an `alignment`, even the non-memop nodes. This `alignment` was then scaled up and down at type casts (e.g. int `0, 4, 8, 12` -> long `0, 8, 16, 24`). If the output-size of the def-node did not match the input-size of the use-node, then the `alignment` would not match up, and we would not pack. > - This is why we used to have checks like `alignment(s1) + data_size(s1) == alignment(s2)` ... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review by Christian Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18822/files - new: https://git.openjdk.org/jdk/pull/18822/files/82c9a77a..d48bafa9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18822&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18822&range=01-02 Stats: 4 lines in 2 files changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18822.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18822/head:pull/18822 PR: https://git.openjdk.org/jdk/pull/18822 From kvn at openjdk.org Tue May 28 17:00:22 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 28 May 2024 17:00:22 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v38] In-Reply-To: References: Message-ID: <7jqyfDXW_EbstH_s90Fp4O7a214ZaejdM0CyAffzOHs=.544c7a91-c66b-4487-a2bf-0b8e300a94c0@github.com> On Tue, 28 May 2024 16:00:10 GMT, Scott Gibbons wrote: >> test/jdk/java/lang/StringBuffer/IndexOf.java line 28: >> >>> 26: * @summary Test indexOf and lastIndexOf >>> 27: * @run main/othervm IndexOf >>> 28: * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions -Xcomp -XX:-TieredCompilation -XX:UseAVX=2 -XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts IndexOf >> >> I suggest to split it into 2 subtest jobs and use `@requires vm.cpu.features ~= ".*avx2.*"` for second which specified `-XX:UseAVX=2`. >> See `compiler/loopopts/superword/TestDependencyOffsets.java` for example. > > @vnkozlov I'm getting an error in CI tests with this line added. Can you please advise? > > `TEST RESULT: Error. Parse Exception: Syntax error in @requires expression: invalid name: vm.cpu.features` You need to add `vm.cpu.features ` line to `test/jdk/TEST.ROOT` file. Similar to what we have in `test/hotspot/jtreg/TEST.ROOT` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617630712 From epeter at openjdk.org Tue May 28 17:00:43 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 May 2024 17:00:43 GMT Subject: RFR: 8325155: C2 SuperWord: remove alignment boundaries [v2] In-Reply-To: References: Message-ID: On Mon, 27 May 2024 14:31:48 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: >> >> - Merge branch 'master' into JDK-8325155-rm-alignment-boundaries >> - rm TODO >> - manual merge >> - revert a line, need to fix it different >> - improve comments >> - fix alignment >> - fix reductions >> - MaxI reduction over chars >> - Merge branch 'master' into JDK-8325155-rm-alignment-boundaries >> - Merge branch 'master' into JDK-8325155-rm-alignment-boundaries >> - ... and 15 more: https://git.openjdk.org/jdk/compare/c4867c62...82c9a77a > > src/hotspot/share/opto/superword.cpp line 2790: > >> 2788: } >> 2789: >> 2790: if (!is_velt_basic_type_compatible_use_def(use, u_idx)) { > > Might be easier to directly put in `def` as defined on L2764 instead of passing the index to the def. Good point. I think I used to require `idx` in a previous iteration, but not any more! > src/hotspot/share/opto/superword.cpp line 2816: > >> 2814: } >> 2815: >> 2816: bool SuperWord::is_velt_basic_type_compatible_use_def(Node* use, int idx) const { > > Maybe add a comment here to quickly explain that compatible means "output size of the def node matches the input size of the use node". Good idea! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1617626973 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1617628690 From jbhateja at openjdk.org Tue May 28 17:09:18 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 28 May 2024 17:09:18 GMT Subject: RFR: 8325083: jdk/incubator/vector/Double512VectorTests.java crashes in Assembler::vex_prefix_and_encode [v3] In-Reply-To: References: Message-ID: > This bugfix patch limits the register class for operands of byte to double cast pattern to prevent reported assertion failure on Knights family CPUs. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review suggestion incorporated. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19407/files - new: https://git.openjdk.org/jdk/pull/19407/files/78699f0c..098157e8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19407&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19407&range=01-02 Stats: 138 lines in 4 files changed: 126 ins; 9 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19407.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19407/head:pull/19407 PR: https://git.openjdk.org/jdk/pull/19407 From jbhateja at openjdk.org Tue May 28 17:09:18 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 28 May 2024 17:09:18 GMT Subject: RFR: 8325083: jdk/incubator/vector/Double512VectorTests.java crashes in Assembler::vex_prefix_and_encode [v3] In-Reply-To: References: Message-ID: <0NfWMv7BK75u3TMQnOCJBMIclY2FMN_TGRySO7x9Yh0=.90ad6159-2b87-4d7c-8baa-312dfd2c917e@github.com> On Tue, 28 May 2024 05:24:58 GMT, Jatin Bhateja wrote: >> @jatin-bhateja can you explain in more details what KNL is missing to trigger the assert? Can we predicate on missing feature here instead of KNL && DBL checks? >> My concern is KNL check could be not enough if such feature is disabled in some container environment which does not match KNL settings. >> >> Why we have `assert(UseAVX > 0` here? `Assembler::vpmov*()` instructions have corresponding asserts already. No need to fix it here but I think we need to cleanup `*.ad` files from such duplicated asserts as separate RFE. > >> @jatin-bhateja can you explain in more details what KNL is missing to trigger the assert? Can we predicate on missing feature here instead of KNL && DBL checks? My concern is KNL check could be not enough if such feature is disabled in some container environment which does not match KNL settings. >> >> Why we have `assert(UseAVX > 0` here? `Assembler::vpmov*()` instructions have corresponding asserts already. No need to fix it here but I think we need to cleanup `*.ad` files from such duplicated asserts as separate RFE. > > Hi @vnkozlov , Problem occurs while emitting VMOVSXBD instruction, please refer to following LOC > https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L2936 > > This is a case for byte to double conversion and we break it into two instructions, first for byte to doubleword casting followed by doubleword to double precision casting, while we could have used a combination of VMOVSXBQ + VCVTQQ2PD but it would further sharpen target constrains since quadword to double precision casting needs AVX512DQ feature, thus current scheme works well and by limiting operand allocations to legacy register set we can safely issue 256 bit VMOVSXBD instruction on KNL target which lacks AVX512VL feature. > @jatin-bhateja Thank you for explaining the issue. I am fine with splitting instruction to use legacy registers. My only concern is use of `!VM_Version::is_knights_family()` check. Can we use `VM_Version::supports_avx512vl()` instead? DONE ------------- PR Comment: https://git.openjdk.org/jdk/pull/19407#issuecomment-2135736176 From jbhateja at openjdk.org Tue May 28 17:18:15 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 28 May 2024 17:18:15 GMT Subject: RFR: 8325083: jdk/incubator/vector/Double512VectorTests.java crashes in Assembler::vex_prefix_and_encode [v4] In-Reply-To: References: Message-ID: > This bugfix patch limits the register class for operands of byte to double cast pattern to prevent reported assertion failure on Knights family CPUs. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Removing unrelated commit - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8325083 - Review suggestion incorporated. - Removing redundant assertions - 8325083: jdk/incubator/vector/Double512VectorTests.java crashes in Assembler::vex_prefix_and_encode ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19407/files - new: https://git.openjdk.org/jdk/pull/19407/files/098157e8..e2e70cdb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19407&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19407&range=02-03 Stats: 4697 lines in 135 files changed: 3307 ins; 843 del; 547 mod Patch: https://git.openjdk.org/jdk/pull/19407.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19407/head:pull/19407 PR: https://git.openjdk.org/jdk/pull/19407 From kvn at openjdk.org Tue May 28 17:18:15 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 28 May 2024 17:18:15 GMT Subject: RFR: 8325083: jdk/incubator/vector/Double512VectorTests.java crashes in Assembler::vex_prefix_and_encode [v3] In-Reply-To: References: Message-ID: On Tue, 28 May 2024 17:09:18 GMT, Jatin Bhateja wrote: >> This bugfix patch limits the register class for operands of byte to double cast pattern to prevent reported assertion failure on Knights family CPUs. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review suggestion incorporated. Something went wrong with your latest push - unrelated changes were included. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19407#issuecomment-2135751352 From jbhateja at openjdk.org Tue May 28 17:21:06 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 28 May 2024 17:21:06 GMT Subject: RFR: 8325083: jdk/incubator/vector/Double512VectorTests.java crashes in Assembler::vex_prefix_and_encode [v3] In-Reply-To: References: Message-ID: On Tue, 28 May 2024 17:15:45 GMT, Vladimir Kozlov wrote: > Something went wrong with your latest push - unrelated changes were included. My bad, was creating a separate PR for it, somehow it got pushed with this one. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19407#issuecomment-2135755688 From epeter at openjdk.org Tue May 28 17:28:39 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 May 2024 17:28:39 GMT Subject: RFR: 8325155: C2 SuperWord: remove alignment boundaries [v2] In-Reply-To: References: Message-ID: On Mon, 27 May 2024 12:42:46 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: >> >> - Merge branch 'master' into JDK-8325155-rm-alignment-boundaries >> - rm TODO >> - manual merge >> - revert a line, need to fix it different >> - improve comments >> - fix alignment >> - fix reductions >> - MaxI reduction over chars >> - Merge branch 'master' into JDK-8325155-rm-alignment-boundaries >> - Merge branch 'master' into JDK-8325155-rm-alignment-boundaries >> - ... and 15 more: https://git.openjdk.org/jdk/compare/c4867c62...82c9a77a > > src/hotspot/share/opto/superword.cpp line 495: > >> 493: >> 494: // Collect all valid VPointers. >> 495: for_each_mem([&] (const MemNode* mem, int bb_idx) { > > The different parts of this method could be nicely put into separate methods which reduces the size of `find_adjacent_memop_pairs()`. > > > GrowableArray vpointers; > collect_valid_vpointers(vpointers); > vpointers.sort(); > // trace code > find_adjacent_memops(vpointers); > // trace code > > The entire "find adjacent memop pairs" code could also be put into a separate class but I leave it up to you to decide if it's worth or not. A class would be nice, but I think I would have to pass around too much for that. `find_adjacent_memop_pairs_in_one_group` requires some things like: _do_vector_loop same_origin_idx can_pack_into_pair // especially this one Not sure it is worth creating a separate class, I think it would become more complicated that way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1617661935 From epeter at openjdk.org Tue May 28 17:36:46 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 May 2024 17:36:46 GMT Subject: RFR: 8325155: C2 SuperWord: remove alignment boundaries [v4] In-Reply-To: References: Message-ID: > I have tried for a very long time to get rid of all the `alignment(n)` code that is all over the SuperWord code. With lots of previous work, I am now finally ready to remove it. > > I was able to remove lots of VM code, about 300 lines. And the removed code is I think much more complicated than the new code. > > This is what I did in this PR: > - Removal of `_node_info`: used to have many fields, which I refactored out to the `VLoopAnalyzer` modules. `alignment` is the last component, which I now remove. > - Changed the implementation of `SuperWord::find_adjacent_refs`, now `SuperWord::find_adjacent_memop_pairs`, completely: > - It used to be an algorithm that would scan over all `memops` repeatedly, try to find some `mem_ref` and see which other memops were comparable, and then pack pairs for all of those, by comparing all-vs-all memops. This algorithm is at least quadratic, if not much worse. > - I now add all `memops` into a single array, sort them by groups (those that are comparable with each other and could be packed into vectors), and inside the groups by ascending offset. This allows me to split off the groups much more efficiently, and also the sorting by offset allows me finding adjacent pairs much more efficiently. In the most cases this reduces the cost to `O(n log n)` for sort, and a linear scan for finding adjacent memops. > - I removed the "alignment boundaries" created in `SuperWord::memory_alignment` by `int off_rem = offset % vw;`. > - This used to have the effect that all offsets were computed modulo the vector width. Hence, pairs could not be packed across this boundary (e.g. we have nodes with offsets `31, 32`, which are adjacent in theory, but if we have a `vw = 32`, then the modulo-offsets are `31, 0`, and they are not detected as adjacent). > - These "alignment boundaries" used to be required for correctness about a year ago, before I fixed and relaxed much of the alignment code. > - The `alignment` used to have another important task: Ensuring compatibility of the input-size of a use node, with the output-size of the def-node. > - This was done by giving all nodes an `alignment`, even the non-memop nodes. This `alignment` was then scaled up and down at type casts (e.g. int `0, 4, 8, 12` -> long `0, 8, 16, 24`). If the output-size of the def-node did not match the input-size of the use-node, then the `alignment` would not match up, and we would not pack. > - This is why we used to have checks like `alignment(s1) + data_size(s1) == alignment(s2)` ... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: - Merge branch 'master' into JDK-8325155-rm-alignment-boundaries - merge - Apply suggestions from code review by Christian Co-authored-by: Christian Hagedorn - more updates for Christian - Merge branch 'master' into JDK-8325155-rm-alignment-boundaries - rm TODO - manual merge - revert a line, need to fix it different - improve comments - fix alignment - ... and 19 more: https://git.openjdk.org/jdk/compare/da6aa2a8...41fc1eb3 ------------- Changes: https://git.openjdk.org/jdk/pull/18822/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18822&range=03 Stats: 1072 lines in 7 files changed: 608 ins; 358 del; 106 mod Patch: https://git.openjdk.org/jdk/pull/18822.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18822/head:pull/18822 PR: https://git.openjdk.org/jdk/pull/18822 From epeter at openjdk.org Tue May 28 17:36:46 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 May 2024 17:36:46 GMT Subject: RFR: 8325155: C2 SuperWord: remove alignment boundaries [v2] In-Reply-To: References: Message-ID: On Tue, 28 May 2024 07:58:38 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: >> >> - Merge branch 'master' into JDK-8325155-rm-alignment-boundaries >> - rm TODO >> - manual merge >> - revert a line, need to fix it different >> - improve comments >> - fix alignment >> - fix reductions >> - MaxI reduction over chars >> - Merge branch 'master' into JDK-8325155-rm-alignment-boundaries >> - Merge branch 'master' into JDK-8325155-rm-alignment-boundaries >> - ... and 15 more: https://git.openjdk.org/jdk/compare/c4867c62...82c9a77a > > Great cleanup! I have some comments but otherwise, looks good. Thanks @chhagedorn for the review! I think I addressed all your points. Except for this: > You should mention here that this method finally adds a new pair to the _pairset. On a separate note, find_adjacent_memop_pairs_in_group() suggests that we find something but we actually "find and add" something without returning anything from the method. Should we make this more clear in the method name? I have not yet found a better name. I think `find` is ok here. I guess we could have it be `find_and_add` but this seems unnecessarily long to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18822#issuecomment-2135777231 From kvn at openjdk.org Tue May 28 17:37:04 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 28 May 2024 17:37:04 GMT Subject: RFR: 8325083: jdk/incubator/vector/Double512VectorTests.java crashes in Assembler::vex_prefix_and_encode [v4] In-Reply-To: References: Message-ID: On Tue, 28 May 2024 17:18:15 GMT, Jatin Bhateja wrote: >> This bugfix patch limits the register class for operands of byte to double cast pattern to prevent reported assertion failure on Knights family CPUs. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Removing unrelated commit > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8325083 > - Review suggestion incorporated. > - Removing redundant assertions > - 8325083: jdk/incubator/vector/Double512VectorTests.java crashes in Assembler::vex_prefix_and_encode Looks good. I will run our testing. Please verify that it still pass reproducer case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19407#issuecomment-2135780847 From duke at openjdk.org Tue May 28 18:04:04 2024 From: duke at openjdk.org (Tobias Hotz) Date: Tue, 28 May 2024 18:04:04 GMT Subject: RFR: 8332856: C2: Add new transform for bool eq/ne (cmp (and (urshift X const1) const2) 0) In-Reply-To: References: Message-ID: On Tue, 28 May 2024 16:26:05 GMT, Emanuel Peter wrote: >> This PR adds a new ideal optimization for the following pattern: >> >> public boolean testFunc(int a) { >> int mask = 0b101; >> int shift = 12; >> return ((a >> shift) & mask) == 0; >> } >> >> Where the mask and shift are constant values and a is a variable. For this optimization to work, the right shift has to be idealized to a unsinged right shift earlier in the pipeline, which here: https://github.com/openjdk/jdk/blob/b92bd671835c37cff58e2cdcecd0fe4277557d7f/src/hotspot/share/opto/mulnode.cpp#L731 >> If the shift is already an unsiged bit shift, it works as well. >> On AMD64 CPUs, this means that this whole line computation can be reduced to a simple `test` instruction. > > I'm looking at `AndINode::Ideal`. > > `AndIL_add_shift_and_mask` seems to do something similar, but not exactly this. Maybe it could be extended. > > Ha, what about this? > > // Masking off sign bits? Dont make them! > if( lop == Op_RShiftI ) { > const TypeInt *t12 = phase->type(load->in(2))->isa_int(); > if( t12 && t12->is_con() ) { // Shift is by a constant > int shift = t12->get_con(); > shift &= BitsPerJavaInteger-1; // semantics of Java shifts > const int sign_bits_mask = ~right_n_bits(BitsPerJavaInteger - shift); > // If the AND'ing of the 2 masks has no bits, then only original shifted > // bits survive. NO sign-extension bits survive the maskings. > if( (sign_bits_mask & mask) == 0 ) { > // Use zero-fill shift instead > Node *zshift = phase->transform(new URShiftINode(load->in(1),load->in(2))); > return new AndINode( zshift, in(2) ); > } > } > } Hi @eme64! Thanks for the quick look. This patch only transforms bool eq/ne 0 cases because the general transformation of `AndI(Shift X shift, mask)` -> `AndI(X, new_mask)` is not valid. As a small example on 4-bit numbers, `(0b0010 >>> 1) & 0b1` would result in `0b0001`, but `0b0010 & 0b0010` would result in `0b0010`, which is obviously not that same result. It always works for scenarios where only a eq/ne test against zero is performed, as the position of the bits doesn't matter there. The transform of `RShift` nodes to `URShiftI` Nodes in the `AndI` Ideal func actually already helps with this transform, as we only need to match `URShiftI` now. As for an example: Take a look at java.awt.Color, especially the `getGreen` or similar functions: https://github.com/openjdk/jdk/blob/da6aa2a86c86ba5fce747b36dcb2d6001cfcc44e/src/java.desktop/share/classes/java/awt/Color.java#L548-L550 If a caller only wants to check that if the green color channel is present, it would call `color.getGreen() != 0`. After inlining, this optimization could be applied in this case, removing the need for the shift. I've also seen this optimization being triggered on real-life applications and will provide a benchmark shortly. I hope this answers your questions/concerns! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19310#issuecomment-2135829419 From sgibbons at openjdk.org Tue May 28 18:30:30 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 18:30:30 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v45] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/01cb58fb..751aace8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=44 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=43-44 Stats: 49 lines in 4 files changed: 20 ins; 13 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sgibbons at openjdk.org Tue May 28 18:30:31 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 18:30:31 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v43] In-Reply-To: <8iJLAJvIzHgXpr5P1hWOLTj-bfO6THNUJkQf7Ki2P9Y=.43680b7c-c3e1-41f8-b065-7955e6237613@github.com> References: <8iJLAJvIzHgXpr5P1hWOLTj-bfO6THNUJkQf7Ki2P9Y=.43680b7c-c3e1-41f8-b065-7955e6237613@github.com> Message-ID: On Tue, 28 May 2024 12:48:19 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix tests > > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 239: > >> 237: // the needle size is less than 32 bytes, we default to a >> 238: // byte-by-byte comparison (this will be rare). >> 239: // > > Is this still true? Yes. For UL, the code within `L_compareFull` effectively does byte-by-byte. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 278: > >> 276: __ bind(L_nextCheck); >> 277: __ testq(haystack_len_p, haystack_len_p); >> 278: __ je(L_zeroCheckFailed); > > This check could be removed as the next check covers this one. No. This is checking for a zero length haystack. The following compare checks for needle length longer than haystack, regardless of the value in each. The comparison is signed, so a haystack length of 0 with a needle length of -1 will pass the following test and assume validity. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 360: > >> 358: __ push(rcx); >> 359: __ push(r8); >> 360: __ push(r9); > > No need to save/restore rcx/r8/r9 on windows platform as well. OK. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 379: > >> 377: >> 378: // Assume failure >> 379: __ movq(rbp, -1); > > We are no more using rbp at return point so this is not needed now? Removed. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 488: > >> 486: __ cmpq(r11, nMinusK); >> 487: __ ja_b(L_return); >> 488: __ movq(rax, r11); > > At places where we know that return value in r11 is correct, we dont need to checkRange so this could have its own label. I don't want to change this because its reason for existence is to ensure we don't return a value that's beyond the end of the haystack. We don't yet have a good enough test to validate whether we're reading past the end of the haystack, so I like this as insurance. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 566: > >> 564: // rbp: -1 >> 565: // XMM_BYTE_0 - first element of needle broadcast >> 566: // XMM_BYTE_K - last element of needle broadcast > > The only registers that are used as input in the switch case are: > r14 = needle > rbx = haystack > rsi = haystack length (n) > r12 = needle length (k) > r10 = n - k (where k is needle length) > XMM_BYTE_0 = first element of needle, broadcast > XMM_BYTE_K = last element of needle, broadcast > So we could only list these, making it easier to comprehend. I listed these registers to make it clear which registers had no expected value and could be used for temps, etc. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 578: > >> 576: // helper jumps to L_checkRangeAndReturn with a (-1) return value. >> 577: big_case_loop_helper(false, 0, L_checkRangeAndReturn, L_loopTop, mask, hsPtrRet, needleLen, >> 578: needle, haystack, hsLength, tmp1, tmp2, tmp3, rScratch, ae, _masm); > > If we run out of haystack instead of jumping to L_checkRangeAndReturn, we could directly jump to L_retrunError. Again, I think we ought to leave this in. Although it executes ~3 instructions that may not be necessary in some cases I think it's best to perform the check. Once we have a good enough test to check reading past the end of the haystack we can change it. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 597: > >> 595: >> 596: // Need a lot of registers here to preserve state across arrays_equals call >> 597: > > This comment is no longer valid, could be removed. OK > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 621: > >> 619: __ addq(hsPtrRet, index); >> 620: __ movq(r11, hsPtrRet); >> 621: __ jmp(L_checkRangeAndReturn); > > Why do we have to checkRange here, would it not be always correct? It so we could return r11 directly (by moving into rax). There are cases where r11 could have a value that, when added to (k - 1) would go past the end of the haystack. I did all in my power to ensure that it doesn't but there's no test I know of to ensure that condition. I would recommend leaving this in for now. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 660: > >> 658: // Haystack always copied to stack, so 32-byte reads OK >> 659: // Haystack length < 32 >> 660: // 10 < needle length < 32 > > Haystack length <= 32 > 10 < needle length <= 32 Changed. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 721: > >> 719: false /* char */, knoreg); >> 720: __ testl(rTmp3, rTmp3); >> 721: __ jne(L_checkRangeAndReturn); > > Why do we have to checkRange here, would it not be always correct? It so we could return r11 directly (by moving into rax). OK > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1333: > >> 1331: >> 1332: __ cmpq(nMinusK, 32); >> 1333: __ jae_b(L_greaterThan32); > > Should this check be (n-k+1) >= 32? And so accordingly (n-k) >= 31 > __ cmpq(nMinusK, 31); > __ jae_b(L_greaterThan32); No. For (n-k)==32 we can do full reads. I'll clarify by changing the label name. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1382: > >> 1380: >> 1381: __ testl(eq_mask, eq_mask); >> 1382: __ je(noMatch); > > We are mixing operation width l and q here. Fixed. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1750: > >> 1748: // r15 = unused >> 1749: // XMM_BYTE_0 - first element of needle, broadcast >> 1750: // XMM_BYTE_K - last element of needle, broadcast > > This comment is duplicated for both small haystack case and big haystack case, could be made a common comment. > Also the only registers that are used as input in the switch case are: > r14 = needle > rbx = haystack > rsi = haystack length (n) > r12 = needle length (k) > r10 = n - k (where k is needle length) > XMM_BYTE_0 = first element of needle, broadcast > XMM_BYTE_K = last element of needle, broadcast > So we could only list these, making it easier to comprehend. I listed all registers for clarity. This ensures that we know what can be used as values or as scratch registers with no ambiguity. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1758: > >> 1756: // >> 1757: // If a match is found, jump to L_checkRange, which ensures the >> 1758: // matched needle is not past the end of the haystack. > > Another comment here would be useful: > // The found index is returned in set_bit (r11). Added. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1810: > >> 1808: // XMM_BYTE_K - last element of needle, broadcast >> 1809: // >> 1810: // The haystack is > 32 bytes > > Good to mention some info about the return found index value in comment about how it is a combination of set_bit (r8), hs_ptr, and haystack. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617663227 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617667775 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617669103 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617671612 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617673870 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617680570 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617699879 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617700813 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617704836 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617705505 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617711973 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617713299 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617714825 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617716598 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617717873 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617726261 From jvernee at openjdk.org Tue May 28 19:29:04 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 28 May 2024 19:29:04 GMT Subject: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier [v2] In-Reply-To: <1tx_tl3PV2W5NCEXXawQY5V2ndnSOHPfjisypuhKdhA=.79840096-bac0-4da4-8102-c7ecea7cb5f0@github.com> References: <1tx_tl3PV2W5NCEXXawQY5V2ndnSOHPfjisypuhKdhA=.79840096-bac0-4da4-8102-c7ecea7cb5f0@github.com> Message-ID: On Mon, 27 May 2024 20:55:29 GMT, Pavel Rappo wrote: >> Please review this PR, which supersedes a now withdrawn https://github.com/openjdk/jdk/pull/14831. >> >> This PR replaces `ArraysSupport.vectorizedHashCode` with a set of more user-friendly methods. Here's a summary: >> >> - Made the operand constants (i.e. `T_BOOLEAN` and friends) and the `vectorizedHashCode` method private >> >> - Made the `vectorizedHashCode` method private, but didn't rename it. Renaming would dramatically increase this PR review cost, because that method's name is used by a lot of VM code. On a bright side, since the method is now private, it's no longer callable by clients of `ArraysSupport`, thus a problem of an inaccurate name is less severe. >> >> - Made the `ArraysSupport.utf16HashCode` method private >> >> - Moved tiny cases (i.e. 0, 1, 2) to `ArraysSupport` > > Pavel Rappo has updated the pull request incrementally with one additional commit since the last revision: > > Fix incorrect utf16 hashCode adaptation src/java.base/share/classes/jdk/internal/util/ArraysSupport.java line 252: > 250: return switch (length) { > 251: case 0 -> initialValue; > 252: case 1 -> 31 * initialValue + (int) a[fromIndex]; Suggestion: case 1 -> 31 * initialValue + (int) a[fromIndex]; // sign extension src/java.base/share/classes/jdk/internal/util/ArraysSupport.java line 275: > 273: return switch (length) { > 274: case 0 -> initialValue; > 275: case 1 -> 31 * initialValue + (a[fromIndex] & 0xff); For clarity, if you think it helps: Suggestion: case 1 -> 31 * initialValue + Byte.toUnsignedInt(a[fromIndex]); src/java.base/share/classes/jdk/internal/util/ArraysSupport.java line 301: > 299: return switch (length) { > 300: case 0 -> initialValue; > 301: case 1 -> 31 * initialValue + JLA.getUTF16Char(a, fromIndex); There seems to be a mismatch here with the original code in StringUTF16, where the length that is tested for is `2` instead of `1`. test/hotspot/jtreg/compiler/intrinsics/TestArraysHashCode.java line 88: > 86: private static int testIntrinsic(byte[] bytes, int type) > 87: throws InvocationTargetException, IllegalAccessException { > 88: return (int) vectorizedHashCode.invoke(null, bytes, 0, 256, 1, type); Better to just call `hashCodeOfUnsigned` here I think. The test for the non-constant type could be dropped. That is no longer a part of the 'API' of `ArraySupport`. It looks like the intrinsic bails out when the basic type is not constant any ways: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/library_call.cpp#L6401-L6404 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19414#discussion_r1617778741 PR Review Comment: https://git.openjdk.org/jdk/pull/19414#discussion_r1617778493 PR Review Comment: https://git.openjdk.org/jdk/pull/19414#discussion_r1617777798 PR Review Comment: https://git.openjdk.org/jdk/pull/19414#discussion_r1617784996 From mdoerr at openjdk.org Tue May 28 20:03:05 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 28 May 2024 20:03:05 GMT Subject: RFR: 8332228: TypePollution.java: Unrecognized VM option 'UseSecondarySuperCache' In-Reply-To: References: Message-ID: On Tue, 28 May 2024 14:01:44 GMT, Martin Doerr wrote: > Fix obvious typo in micro benchmark. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19427#issuecomment-2136008570 From mdoerr at openjdk.org Tue May 28 20:03:05 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 28 May 2024 20:03:05 GMT Subject: Integrated: 8332228: TypePollution.java: Unrecognized VM option 'UseSecondarySuperCache' In-Reply-To: References: Message-ID: <85PGhPbexhdUePSavv2kKNLGCWNd3ba_lswZFlob-oc=.07afbfdb-4f76-48b0-9215-541e9add95a9@github.com> On Tue, 28 May 2024 14:01:44 GMT, Martin Doerr wrote: > Fix obvious typo in micro benchmark. This pull request has now been integrated. Changeset: 9ac8d05a Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/9ac8d05a2567fbf65b944660739e5f8ad1fc2020 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod 8332228: TypePollution.java: Unrecognized VM option 'UseSecondarySuperCache' Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/19427 From duke at openjdk.org Tue May 28 20:11:35 2024 From: duke at openjdk.org (Tobias Hotz) Date: Tue, 28 May 2024 20:11:35 GMT Subject: RFR: 8332856: C2: Add new transform for bool eq/ne (cmp (and (urshift X const1) const2) 0) [v2] In-Reply-To: References: Message-ID: > This PR adds a new ideal optimization for the following pattern: > > public boolean testFunc(int a) { > int mask = 0b101; > int shift = 12; > return ((a >> shift) & mask) == 0; > } > > Where the mask and shift are constant values and a is a variable. For this optimization to work, the right shift has to be idealized to a unsinged right shift earlier in the pipeline, which here: https://github.com/openjdk/jdk/blob/b92bd671835c37cff58e2cdcecd0fe4277557d7f/src/hotspot/share/opto/mulnode.cpp#L731 > If the shift is already an unsiged bit shift, it works as well. > On AMD64 CPUs, this means that this whole line computation can be reduced to a simple `test` instruction. Tobias Hotz has updated the pull request incrementally with two additional commits since the last revision: - LF endings... - Add a benchmark to measure effect of new ideal transformation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19310/files - new: https://git.openjdk.org/jdk/pull/19310/files/d18c9467..3491ceb3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19310&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19310&range=00-01 Stats: 57 lines in 1 file changed: 57 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19310.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19310/head:pull/19310 PR: https://git.openjdk.org/jdk/pull/19310 From duke at openjdk.org Tue May 28 20:11:35 2024 From: duke at openjdk.org (Tobias Hotz) Date: Tue, 28 May 2024 20:11:35 GMT Subject: RFR: 8332856: C2: Add new transform for bool eq/ne (cmp (and (urshift X const1) const2) 0) In-Reply-To: References: Message-ID: On Mon, 20 May 2024 14:15:46 GMT, Tobias Hotz wrote: > This PR adds a new ideal optimization for the following pattern: > > public boolean testFunc(int a) { > int mask = 0b101; > int shift = 12; > return ((a >> shift) & mask) == 0; > } > > Where the mask and shift are constant values and a is a variable. For this optimization to work, the right shift has to be idealized to a unsinged right shift earlier in the pipeline, which here: https://github.com/openjdk/jdk/blob/b92bd671835c37cff58e2cdcecd0fe4277557d7f/src/hotspot/share/opto/mulnode.cpp#L731 > If the shift is already an unsiged bit shift, it works as well. > On AMD64 CPUs, this means that this whole line computation can be reduced to a simple `test` instruction. Here are some numbers of the newly added benchmark: Baseline: (min, avg, max) = (20,223, 20,302, 20,411), stdev = 0,058 With patch: (min, avg, max) = (29,668, 29,765, 29,941), stdev = 0,098 As you can see, the time per iteration was reduced by roughly 30% in this microbenchmark ------------- PR Comment: https://git.openjdk.org/jdk/pull/19310#issuecomment-2136022313 From redestad at openjdk.org Tue May 28 20:24:01 2024 From: redestad at openjdk.org (Claes Redestad) Date: Tue, 28 May 2024 20:24:01 GMT Subject: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier [v2] In-Reply-To: References: <1tx_tl3PV2W5NCEXXawQY5V2ndnSOHPfjisypuhKdhA=.79840096-bac0-4da4-8102-c7ecea7cb5f0@github.com> Message-ID: On Tue, 28 May 2024 19:19:51 GMT, Jorn Vernee wrote: >> Pavel Rappo has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix incorrect utf16 hashCode adaptation > > test/hotspot/jtreg/compiler/intrinsics/TestArraysHashCode.java line 88: > >> 86: private static int testIntrinsic(byte[] bytes, int type) >> 87: throws InvocationTargetException, IllegalAccessException { >> 88: return (int) vectorizedHashCode.invoke(null, bytes, 0, 256, 1, type); > > Better to just call `hashCodeOfUnsigned` here I think. > > The test for the non-constant type could be dropped. That is no longer a part of the 'API' of `ArraySupport`. It looks like the intrinsic bails out when the basic type is not constant any ways: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/library_call.cpp#L6401-L6404 The non-constant test was added because that very bailout caused a crash. The other test is actually less interesting since it'll likely be covered indirectly by regular use. But as we are hiding these away this gets ever more obscure and perhaps the test could be dropped entirely. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19414#discussion_r1617848032 From sviswanathan at openjdk.org Tue May 28 20:28:16 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 28 May 2024 20:28:16 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v43] In-Reply-To: References: <8iJLAJvIzHgXpr5P1hWOLTj-bfO6THNUJkQf7Ki2P9Y=.43680b7c-c3e1-41f8-b065-7955e6237613@github.com> Message-ID: On Tue, 28 May 2024 17:59:49 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 578: >> >>> 576: // helper jumps to L_checkRangeAndReturn with a (-1) return value. >>> 577: big_case_loop_helper(false, 0, L_checkRangeAndReturn, L_loopTop, mask, hsPtrRet, needleLen, >>> 578: needle, haystack, hsLength, tmp1, tmp2, tmp3, rScratch, ae, _masm); >> >> If we run out of haystack instead of jumping to L_checkRangeAndReturn, we could directly jump to L_retrunError. > > Again, I think we ought to leave this in. Although it executes ~3 instructions that may not be necessary in some cases I think it's best to perform the check. Once we have a good enough test to check reading past the end of the haystack we can change it. In this particular case, we are returning -1 (NoMatch), so no need to do L_checkRangeAndReturn here, we could directly jump to L_returnError. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617853337 From sviswanathan at openjdk.org Tue May 28 20:32:16 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 28 May 2024 20:32:16 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v43] In-Reply-To: References: <8iJLAJvIzHgXpr5P1hWOLTj-bfO6THNUJkQf7Ki2P9Y=.43680b7c-c3e1-41f8-b065-7955e6237613@github.com> Message-ID: <7FujNShE9NvvlsGRZUR061xtnF-PCD8k8fmkM2kCS1I=.25525aec-f0bd-4587-b571-78d5dedc7d55@github.com> On Tue, 28 May 2024 17:30:24 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 278: >> >>> 276: __ bind(L_nextCheck); >>> 277: __ testq(haystack_len_p, haystack_len_p); >>> 278: __ je(L_zeroCheckFailed); >> >> This check could be removed as the next check covers this one. > > No. This is checking for a zero length haystack. The following compare checks for needle length longer than haystack, regardless of the value in each. The comparison is signed, so a haystack length of 0 with a needle length of -1 will pass the following test and assume validity. But we have already checked for needle length to be greater than 0 in the following lines: __ cmpq(needle_len_p, 0); __ jg_b(L_nextCheck); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617857240 From sviswanathan at openjdk.org Tue May 28 20:40:17 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 28 May 2024 20:40:17 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v43] In-Reply-To: References: <8iJLAJvIzHgXpr5P1hWOLTj-bfO6THNUJkQf7Ki2P9Y=.43680b7c-c3e1-41f8-b065-7955e6237613@github.com> Message-ID: On Tue, 28 May 2024 18:11:13 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1333: >> >>> 1331: >>> 1332: __ cmpq(nMinusK, 32); >>> 1333: __ jae_b(L_greaterThan32); >> >> Should this check be (n-k+1) >= 32? And so accordingly (n-k) >= 31 >> __ cmpq(nMinusK, 31); >> __ jae_b(L_greaterThan32); > > No. For (n-k)==32 we can do full reads. I'll clarify by changing the label name. We can also do full reads for (n-k) == 31, as we also compare the kth byte. >> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1750: >> >>> 1748: // r15 = unused >>> 1749: // XMM_BYTE_0 - first element of needle, broadcast >>> 1750: // XMM_BYTE_K - last element of needle, broadcast >> >> This comment is duplicated for both small haystack case and big haystack case, could be made a common comment. >> Also the only registers that are used as input in the switch case are: >> r14 = needle >> rbx = haystack >> rsi = haystack length (n) >> r12 = needle length (k) >> r10 = n - k (where k is needle length) >> XMM_BYTE_0 = first element of needle, broadcast >> XMM_BYTE_K = last element of needle, broadcast >> So we could only list these, making it easier to comprehend. > > I listed all registers for clarity. This ensures that we know what can be used as values or as scratch registers with no ambiguity. Sounds good. We could keep only comment out of the two as it is the same for both small haystack and big haystack. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617862799 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617865049 From redestad at openjdk.org Tue May 28 20:43:02 2024 From: redestad at openjdk.org (Claes Redestad) Date: Tue, 28 May 2024 20:43:02 GMT Subject: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier [v2] In-Reply-To: References: <1tx_tl3PV2W5NCEXXawQY5V2ndnSOHPfjisypuhKdhA=.79840096-bac0-4da4-8102-c7ecea7cb5f0@github.com> Message-ID: On Tue, 28 May 2024 19:13:30 GMT, Jorn Vernee wrote: >> Pavel Rappo has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix incorrect utf16 hashCode adaptation > > src/java.base/share/classes/jdk/internal/util/ArraysSupport.java line 275: > >> 273: return switch (length) { >> 274: case 0 -> initialValue; >> 275: case 1 -> 31 * initialValue + (a[fromIndex] & 0xff); > > For clarity, if you think it helps: > Suggestion: > > case 1 -> 31 * initialValue + Byte.toUnsignedInt(a[fromIndex]); I don't care as long as microbenchmarks don't get a hiccup. > src/java.base/share/classes/jdk/internal/util/ArraysSupport.java line 301: > >> 299: return switch (length) { >> 300: case 0 -> initialValue; >> 301: case 1 -> 31 * initialValue + JLA.getUTF16Char(a, fromIndex); > > There seems to be a mismatch here with the original code in StringUTF16, where the length that is tested for is `2` instead of `1`. Yes, should be `2` (`a` is semantically a `char[]`). This typo likely pass functional testing since `1` can never happen in practice, and the default case should work for any value. There might be a String microbenchmark out there that might be slightly unhappy, though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19414#discussion_r1617867797 PR Review Comment: https://git.openjdk.org/jdk/pull/19414#discussion_r1617865658 From sgibbons at openjdk.org Tue May 28 20:54:15 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 20:54:15 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v43] In-Reply-To: <7FujNShE9NvvlsGRZUR061xtnF-PCD8k8fmkM2kCS1I=.25525aec-f0bd-4587-b571-78d5dedc7d55@github.com> References: <8iJLAJvIzHgXpr5P1hWOLTj-bfO6THNUJkQf7Ki2P9Y=.43680b7c-c3e1-41f8-b065-7955e6237613@github.com> <7FujNShE9NvvlsGRZUR061xtnF-PCD8k8fmkM2kCS1I=.25525aec-f0bd-4587-b571-78d5dedc7d55@github.com> Message-ID: On Tue, 28 May 2024 20:29:38 GMT, Sandhya Viswanathan wrote: >> No. This is checking for a zero length haystack. The following compare checks for needle length longer than haystack, regardless of the value in each. The comparison is signed, so a haystack length of 0 with a needle length of -1 will pass the following test and assume validity. > > But we have already checked for needle length to be greater than 0 in the following lines: > __ cmpq(needle_len_p, 0); > __ jg_b(L_nextCheck); OK >> Again, I think we ought to leave this in. Although it executes ~3 instructions that may not be necessary in some cases I think it's best to perform the check. Once we have a good enough test to check reading past the end of the haystack we can change it. > > In this particular case, we are returning -1 (NoMatch), so no need to do L_checkRangeAndReturn here, we could directly jump to L_returnError. OK. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617876757 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617874637 From sgibbons at openjdk.org Tue May 28 20:59:16 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 20:59:16 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v43] In-Reply-To: References: <8iJLAJvIzHgXpr5P1hWOLTj-bfO6THNUJkQf7Ki2P9Y=.43680b7c-c3e1-41f8-b065-7955e6237613@github.com> Message-ID: On Tue, 28 May 2024 20:35:26 GMT, Sandhya Viswanathan wrote: >> No. For (n-k)==32 we can do full reads. I'll clarify by changing the label name. > > We can also do full reads for (n-k) == 31, as we also compare the kth byte. I'll change and test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617883225 From duke at openjdk.org Tue May 28 21:06:14 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 28 May 2024 21:06:14 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v43] In-Reply-To: References: <8iJLAJvIzHgXpr5P1hWOLTj-bfO6THNUJkQf7Ki2P9Y=.43680b7c-c3e1-41f8-b065-7955e6237613@github.com> Message-ID: On Tue, 28 May 2024 17:36:03 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 488: >> >>> 486: __ cmpq(r11, nMinusK); >>> 487: __ ja_b(L_return); >>> 488: __ movq(rax, r11); >> >> At places where we know that return value in r11 is correct, we dont need to checkRange so this could have its own label. > > I don't want to change this because its reason for existence is to ensure we don't return a value that's beyond the end of the haystack. We don't yet have a good enough test to validate whether we're reading past the end of the haystack, so I like this as insurance. I would recommend an experiment. Disable the range-check and run String/IndexOf.java test. Particularly run test4(), which is designed exactly to test the reads beyond the end. It wont find all the bad reads, but right now if there are any failures, they are 'hidden' by this range-check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617888680 From sgibbons at openjdk.org Tue May 28 21:06:14 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 21:06:14 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v43] In-Reply-To: References: <8iJLAJvIzHgXpr5P1hWOLTj-bfO6THNUJkQf7Ki2P9Y=.43680b7c-c3e1-41f8-b065-7955e6237613@github.com> Message-ID: On Tue, 28 May 2024 20:56:42 GMT, Scott Gibbons wrote: >> We can also do full reads for (n-k) == 31, as we also compare the kth byte. > > I'll change and test. Passes tests, so I'll change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617886613 From sgibbons at openjdk.org Tue May 28 21:06:15 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 21:06:15 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v43] In-Reply-To: References: <8iJLAJvIzHgXpr5P1hWOLTj-bfO6THNUJkQf7Ki2P9Y=.43680b7c-c3e1-41f8-b065-7955e6237613@github.com> Message-ID: On Tue, 28 May 2024 20:37:43 GMT, Sandhya Viswanathan wrote: >> I listed all registers for clarity. This ensures that we know what can be used as values or as scratch registers with no ambiguity. > > Sounds good. We could keep only comment out of the two as it is the same for both small haystack and big haystack. OK ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617889756 From sgibbons at openjdk.org Tue May 28 21:12:16 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 21:12:16 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v37] In-Reply-To: References: <4xYUBsOJ_eDSuj6w9AjUo_6gFN_9piWR-ChLrHQoXl4=.88756684-8e9c-48e3-8b59-f5f684b81cde@github.com> Message-ID: On Fri, 24 May 2024 20:42:12 GMT, Scott Gibbons wrote: >> test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 185: >> >>> 183: } >>> 184: >>> 185: private static int indexOfKernel(String haystack, String needle) { >> >> Is the intention of kernels not to be inlined so that it would be part of separate compilation? >> >> If so, you probably want to annotate it with `@CompilerControl(CompilerControl.Mode.DONT_INLINE)` >> >> i.e. https://github.com/openjdk/jmh/blob/master/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_16_CompilerControl.java > > Fixed. CompilerControl is unavailable here. Added a runtime option instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617894475 From sgibbons at openjdk.org Tue May 28 21:12:16 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 21:12:16 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v38] In-Reply-To: <7jqyfDXW_EbstH_s90Fp4O7a214ZaejdM0CyAffzOHs=.544c7a91-c66b-4487-a2bf-0b8e300a94c0@github.com> References: <7jqyfDXW_EbstH_s90Fp4O7a214ZaejdM0CyAffzOHs=.544c7a91-c66b-4487-a2bf-0b8e300a94c0@github.com> Message-ID: On Tue, 28 May 2024 16:57:54 GMT, Vladimir Kozlov wrote: >> @vnkozlov I'm getting an error in CI tests with this line added. Can you please advise? >> >> `TEST RESULT: Error. Parse Exception: Syntax error in @requires expression: invalid name: vm.cpu.features` > > You need to add `vm.cpu.features ` line to `test/jdk/TEST.ROOT` file. Similar to what we have in `test/hotspot/jtreg/TEST.ROOT` Fixed. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617893462 From sgibbons at openjdk.org Tue May 28 21:20:15 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 21:20:15 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v43] In-Reply-To: <8iJLAJvIzHgXpr5P1hWOLTj-bfO6THNUJkQf7Ki2P9Y=.43680b7c-c3e1-41f8-b065-7955e6237613@github.com> References: <8iJLAJvIzHgXpr5P1hWOLTj-bfO6THNUJkQf7Ki2P9Y=.43680b7c-c3e1-41f8-b065-7955e6237613@github.com> Message-ID: On Tue, 28 May 2024 16:37:23 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix tests > > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 488: > >> 486: __ cmpq(r11, nMinusK); >> 487: __ ja_b(L_return); >> 488: __ movq(rax, r11); > > At places where we know that return value in r11 is correct, we dont need to checkRange so this could have its own label. Disabling causes the test to succeed, so we're not finding matches beyond the end of the string, correct? Are we confident that this test passing can warrant removing the range check? @sviswa7 ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617901070 From prappo at openjdk.org Tue May 28 22:13:01 2024 From: prappo at openjdk.org (Pavel Rappo) Date: Tue, 28 May 2024 22:13:01 GMT Subject: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier [v2] In-Reply-To: References: <1tx_tl3PV2W5NCEXXawQY5V2ndnSOHPfjisypuhKdhA=.79840096-bac0-4da4-8102-c7ecea7cb5f0@github.com> Message-ID: On Tue, 28 May 2024 20:38:21 GMT, Claes Redestad wrote: >> src/java.base/share/classes/jdk/internal/util/ArraysSupport.java line 301: >> >>> 299: return switch (length) { >>> 300: case 0 -> initialValue; >>> 301: case 1 -> 31 * initialValue + JLA.getUTF16Char(a, fromIndex); >> >> There seems to be a mismatch here with the original code in StringUTF16, where the length that is tested for is `2` instead of `1`. > > Yes, should be `2` (`a` is semantically a `char[]`). This typo likely pass functional testing since `1` can never happen in practice, and the default case should work for any value. There might be a String microbenchmark out there that might be slightly unhappy, though. I believe, it should be `1`. Hear me out. In this method, the `length` is scaled down, whereas in `StringUTF16` it is not. In this method, it's `length`, in `StringUTF16` it's `((byte[]) value).length`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19414#discussion_r1617941436 From prappo at openjdk.org Tue May 28 22:23:01 2024 From: prappo at openjdk.org (Pavel Rappo) Date: Tue, 28 May 2024 22:23:01 GMT Subject: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier [v2] In-Reply-To: References: <1tx_tl3PV2W5NCEXXawQY5V2ndnSOHPfjisypuhKdhA=.79840096-bac0-4da4-8102-c7ecea7cb5f0@github.com> Message-ID: On Tue, 28 May 2024 22:08:06 GMT, Pavel Rappo wrote: >> Yes, should be `2` (`a` is semantically a `char[]`). This typo likely pass functional testing since `1` can never happen in practice, and the default case should work for any value. There might be a String microbenchmark out there that might be slightly unhappy, though. > > I believe, it should be `1`. Hear me out. In this method, the `length` is scaled down, whereas in `StringUTF16` it is not. In this method, it's `length`, in `StringUTF16` it's `((byte[]) value).length`. In fact, if I change it to `2`, the following tests will fail: - `jdk/jdk/classfile/Utf8EntryTest.java` - `jdk/java/util/zip/ZipCoding.java` - `jdk/java/text/Format/MessageFormat/MessageRegression.java` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19414#discussion_r1617950633 From sgibbons at openjdk.org Tue May 28 22:33:18 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 22:33:18 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v43] In-Reply-To: References: <8iJLAJvIzHgXpr5P1hWOLTj-bfO6THNUJkQf7Ki2P9Y=.43680b7c-c3e1-41f8-b065-7955e6237613@github.com> Message-ID: <2OsgJsQtfArLRfrVbwvYJKpx3ljhT2fU3UUdWJsUiCY=.91914663-2fef-4696-b1d8-4f7b0c951205@github.com> On Tue, 28 May 2024 21:17:07 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 488: >> >>> 486: __ cmpq(r11, nMinusK); >>> 487: __ ja_b(L_return); >>> 488: __ movq(rax, r11); >> >> At places where we know that return value in r11 is correct, we dont need to checkRange so this could have its own label. > > Disabling causes the test to succeed, so we're not finding matches beyond the end of the string, correct? Are we confident that this test passing can warrant removing the range check? @sviswa7 ? Removed. >> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 621: >> >>> 619: __ addq(hsPtrRet, index); >>> 620: __ movq(r11, hsPtrRet); >>> 621: __ jmp(L_checkRangeAndReturn); >> >> Why do we have to checkRange here, would it not be always correct? It so we could return r11 directly (by moving into rax). > > There are cases where r11 could have a value that, when added to (k - 1) would go past the end of the haystack. I did all in my power to ensure that it doesn't but there's no test I know of to ensure that condition. I would recommend leaving this in for now. Removed checkRangeAndReturn ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617956870 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617956635 From sgibbons at openjdk.org Tue May 28 22:33:19 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 22:33:19 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: <8Y-nIHc8vfB1X_hp3tpqqqgpCzu6dAt6BBIP_zc4Q70=.c9a48c68-8c14-4af9-8357-ab50e62a5fd3@github.com> Message-ID: On Thu, 16 May 2024 18:09:04 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 418: >> >>> 416: __ cmpq(haystack_len, 0x10); >>> 417: __ ja_b(L_moreThan16); >>> 418: >> >> An assert here to check for header size >= 16 would be good. >> Also a comment here would he good, something like: >> // Copy 16 or 32 bytes prior to haystack end onto stack >> // This will possibly including some object header bytes when haystack length is less than 16 or 32 bytes // Set the new haystack address to beginning of copied haystack on stack adjusting for extra bytes copied > > I don't know how to assert header size >= 16 bytes, so I'll add a comment stating such. If you can tell me how to assert, I'll add that code in place of the comment. Fixed in library_call.cpp ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617955173 From kvn at openjdk.org Tue May 28 22:40:05 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 28 May 2024 22:40:05 GMT Subject: RFR: 8325083: jdk/incubator/vector/Double512VectorTests.java crashes in Assembler::vex_prefix_and_encode [v4] In-Reply-To: References: Message-ID: On Tue, 28 May 2024 17:18:15 GMT, Jatin Bhateja wrote: >> This bugfix patch limits the register class for operands of byte to double cast pattern to prevent reported assertion failure on Knights family CPUs. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Removing unrelated commit > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8325083 > - Review suggestion incorporated. > - Removing redundant assertions > - 8325083: jdk/incubator/vector/Double512VectorTests.java crashes in Assembler::vex_prefix_and_encode My testing passed without new failures. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19407#pullrequestreview-2083956184 From sgibbons at openjdk.org Tue May 28 22:47:42 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 22:47:42 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v46] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Final review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/751aace8..355325d0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=45 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=44-45 Stats: 95 lines in 3 files changed: 23 ins; 51 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sgibbons at openjdk.org Tue May 28 23:52:27 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 23:52:27 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v47] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Move assert to where it's actually important. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/355325d0..db0ab75a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=46 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=45-46 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sviswanathan at openjdk.org Wed May 29 00:07:02 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 29 May 2024 00:07:02 GMT Subject: RFR: 8325083: jdk/incubator/vector/Double512VectorTests.java crashes in Assembler::vex_prefix_and_encode [v4] In-Reply-To: References: Message-ID: On Tue, 28 May 2024 17:18:15 GMT, Jatin Bhateja wrote: >> This bugfix patch limits the register class for operands of byte to double cast pattern to prevent reported assertion failure on Knights family CPUs. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Removing unrelated commit > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8325083 > - Review suggestion incorporated. > - Removing redundant assertions > - 8325083: jdk/incubator/vector/Double512VectorTests.java crashes in Assembler::vex_prefix_and_encode Looks good to me as well. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19407#pullrequestreview-2084025584 From dlong at openjdk.org Wed May 29 00:45:02 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 29 May 2024 00:45:02 GMT Subject: RFR: 8332856: C2: Add new transform for bool eq/ne (cmp (and (urshift X const1) const2) 0) [v2] In-Reply-To: References: Message-ID: <8eDJ94rgbRqcNVmXRpjph1MjnVTG6NjTHzDsISubWD0=.c972efef-277c-452b-b3d8-47d33a50cd36@github.com> On Tue, 28 May 2024 20:11:35 GMT, Tobias Hotz wrote: >> This PR adds a new ideal optimization for the following pattern: >> >> public boolean testFunc(int a) { >> int mask = 0b101; >> int shift = 12; >> return ((a >> shift) & mask) == 0; >> } >> >> Where the mask and shift are constant values and a is a variable. For this optimization to work, the right shift has to be idealized to a unsinged right shift earlier in the pipeline, which here: https://github.com/openjdk/jdk/blob/b92bd671835c37cff58e2cdcecd0fe4277557d7f/src/hotspot/share/opto/mulnode.cpp#L731 >> If the shift is already an unsiged bit shift, it works as well. >> On AMD64 CPUs, this means that this whole line computation can be reduced to a simple `test` instruction. > > Tobias Hotz has updated the pull request incrementally with two additional commits since the last revision: > > - LF endings... > - Add a benchmark to measure effect of new ideal transformation It seems like this could hurt some platforms (or at least require an additional temp register), if the shifted value we AND with can no longer be encoded as an immediate (RISC CPUs can't encode arbitrary immediates in one instruction). Also, I'm wondering if this could be implemented in the backend with a appropriate match rule pattern in the .ad file. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19310#issuecomment-2136320901 From jbhateja at openjdk.org Wed May 29 02:21:05 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 29 May 2024 02:21:05 GMT Subject: Integrated: 8325083: jdk/incubator/vector/Double512VectorTests.java crashes in Assembler::vex_prefix_and_encode In-Reply-To: References: Message-ID: On Mon, 27 May 2024 06:06:47 GMT, Jatin Bhateja wrote: > This bugfix patch limits the register class for operands of byte to double cast pattern to prevent reported assertion failure on Knights family CPUs. > > Kindly review and share your feedback. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: 01060ad4 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/01060ad4ab18581aa46bc16e64c7f12a591a682b Stats: 14 lines in 1 file changed: 12 ins; 2 del; 0 mod 8325083: jdk/incubator/vector/Double512VectorTests.java crashes in Assembler::vex_prefix_and_encode Reviewed-by: kvn, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/19407 From fyang at openjdk.org Wed May 29 02:27:02 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 29 May 2024 02:27:02 GMT Subject: RFR: 8333006: RISC-V: C2: Support vector-scalar and vector-immediate arithmetic instructions [v2] In-Reply-To: <-0PMAvAivEpLgl0qFHHn21m3MrU7Xa7QmO6g2qHgfRQ=.120a7ee2-5525-4bfb-887b-da860feca3d0@github.com> References: <-0PMAvAivEpLgl0qFHHn21m3MrU7Xa7QmO6g2qHgfRQ=.120a7ee2-5525-4bfb-887b-da860feca3d0@github.com> Message-ID: <1UGjH0ulqBYXt8Jeq7CCGUdQ7-mxZSBIRzk5MY8Wyus=.17aee7cc-bc08-4cfc-8272-6c0f10249e5c@github.com> On Tue, 28 May 2024 10:49:26 GMT, Gui Cao wrote: >> Hi, We want to support vector-scalar and vector-immediate arithmetic instructions, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot. >> We can use the Byte256VectorTests.java[2] to print the Opto JIT Code, verify and observe the generation of nodes. >> >> For example, we can use the following command to print the Opto JIT Code of a jtreg test case: >> >> >> /home/zifeihan/jtreg/bin/jtreg \ >> -v:default \ >> -concurrency:16 -timeout:50 \ >> -javaoption:-XX:+UnlockExperimentalVMOptions \ >> -javaoption:-XX:+UseRVV \ >> -javaoption:-XX:+PrintOptoAssembly \ >> -javaoption:-XX:LogFile=/home/zifeihan/jdk/Byte256VectorTests_PrintOptoAssembly.log \ >> -jdk:/home/zifeihan/jdk/build/linux-riscv64-server-fastdebug/jdk \ >> /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/Byte256VectorTests.java >> >> >> >> we can observe the specified compilation log `Byte256VectorTests_PrintOptoAssembly.log`, which contains the vector-scalar and vector-immediate arithmetic instructions for the PR implementation. >> >> vadd_immI Node >> >> 16c addw R11, R10, zr #@convI2L_reg_reg >> 170 add R9, R31, R11 # ptr, #@addP_reg_reg >> 174 addi R9, R9, #16 # ptr, #@addP_reg_imm >> 176 loadV V1, [R9] # vector (rvv) >> 17e vadd_immI V1, V1, #7 >> 186 add R11, R15, R11 # ptr, #@addP_reg_reg >> 188 addi R11, R11, #16 # ptr, #@addP_reg_imm >> 18a storeV [R11], V1 # vector (rvv) >> >> >> vadd_immI_masked Node >> >> 1e8 B31: # out( B37 B32 ) <- in( B30 ) Freq: 76.2281 >> 1e8 loadV V2, [R31] # vector (rvv) >> 1f0 vloadmask V0, V1 >> 1f8 vadd_immI_masked V2, V2, #7 >> 200 addi R31, R10, #48 # ptr, #@addP_reg_imm >> 204 bgeu R30, R7, B37 #@cmpU_branch P=0.000001 C=-1.000000 >> >> >> vadd_regI Node >> >> 0c4 B4: # out( B9 B5 ) <- in( B8 B3 ) Freq: 1 >> 0c4 vloadcon V1 # generate iota indices >> 0cc spill [sp, #4] -> R30 # spill size = 32 >> 0ce vmul_regI V1, V1, R30 >> 0d6 spill [sp, #0] -> R29 # spill size = 32 >> 0d8 vadd_regI V1, V1, R29 >> >> >> vadd_regI_masked Node >> >> 244 B36: # out( B33 B37 ) <- in( B35 ) Freq: 7427.81 >> 244 # castII of R30, #@castII >> 244 addw R31, R30, zr #@convI2L_reg_reg >> 248 spill [sp, #32] -> R10 # spill size = 64 >> 24a add R10, R10, R31 # ptr, #@addP_reg_reg >> 24c addi R10, R10, #16 # ptr, #@addP_reg_imm >> 24e loadV V2, [R10] # vector (rvv) >> 256 vloadmask V0, V1 >> 25e vadd_regI_masked V2, V2, R29 >> >> >> ... > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Code Format Updated change looks good. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19415#pullrequestreview-2084137672 From sviswanathan at openjdk.org Wed May 29 03:05:18 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 29 May 2024 03:05:18 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v47] In-Reply-To: References: Message-ID: On Tue, 28 May 2024 23:52:27 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Move assert to where it's actually important. Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16753#pullrequestreview-2084177134 From liach at openjdk.org Wed May 29 03:24:01 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 29 May 2024 03:24:01 GMT Subject: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier [v2] In-Reply-To: References: <1tx_tl3PV2W5NCEXXawQY5V2ndnSOHPfjisypuhKdhA=.79840096-bac0-4da4-8102-c7ecea7cb5f0@github.com> Message-ID: On Tue, 28 May 2024 22:20:39 GMT, Pavel Rappo wrote: >> I believe, it should be `1`. Hear me out. In this method, the `length` is scaled down, whereas in `StringUTF16` it is not. In this method, it's `length`, in `StringUTF16` it's `((byte[]) value).length`. > > In fact, if I change it to `2`, the following tests will fail: > > - `jdk/jdk/classfile/Utf8EntryTest.java` > - `jdk/java/util/zip/ZipCoding.java` > - `jdk/java/text/Format/MessageFormat/MessageRegression.java` Indeed, the actual length passed at call site is `value.length >> 1` instead of `value.length`; this adjusted char-length carries on to `vectorizedHashCode` call. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19414#discussion_r1618126401 From liach at openjdk.org Wed May 29 03:24:02 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 29 May 2024 03:24:02 GMT Subject: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier [v2] In-Reply-To: <1tx_tl3PV2W5NCEXXawQY5V2ndnSOHPfjisypuhKdhA=.79840096-bac0-4da4-8102-c7ecea7cb5f0@github.com> References: <1tx_tl3PV2W5NCEXXawQY5V2ndnSOHPfjisypuhKdhA=.79840096-bac0-4da4-8102-c7ecea7cb5f0@github.com> Message-ID: On Mon, 27 May 2024 20:55:29 GMT, Pavel Rappo wrote: >> Please review this PR, which supersedes a now withdrawn https://github.com/openjdk/jdk/pull/14831. >> >> This PR replaces `ArraysSupport.vectorizedHashCode` with a set of more user-friendly methods. Here's a summary: >> >> - Made the operand constants (i.e. `T_BOOLEAN` and friends) and the `vectorizedHashCode` method private >> >> - Made the `vectorizedHashCode` method private, but didn't rename it. Renaming would dramatically increase this PR review cost, because that method's name is used by a lot of VM code. On a bright side, since the method is now private, it's no longer callable by clients of `ArraysSupport`, thus a problem of an inaccurate name is less severe. >> >> - Made the `ArraysSupport.utf16HashCode` method private >> >> - Moved tiny cases (i.e. 0, 1, 2) to `ArraysSupport` > > Pavel Rappo has updated the pull request incrementally with one additional commit since the last revision: > > Fix incorrect utf16 hashCode adaptation src/java.base/share/classes/jdk/internal/util/ArraysSupport.java line 320: > 318: * @return the calculated hash value > 319: */ > 320: public static int hashCode(Object[] a, int fromIndex, int length, int initialValue) { Is the object variant necessary here? The object version is hard for JIT to profile as it's quite polymorphic compared to other arrays, and the initial value is always 1. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19414#discussion_r1618129002 From luhenry at openjdk.org Wed May 29 03:30:08 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 29 May 2024 03:30:08 GMT Subject: RFR: 8320999: RISC-V: C2 RotateLeftV [v4] In-Reply-To: References: Message-ID: On Tue, 28 May 2024 09:06:14 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> More detailed description is inline in the code. >> Thanks > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > add reg version Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19325#pullrequestreview-2084194386 From jbhateja at openjdk.org Wed May 29 06:15:21 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 29 May 2024 06:15:21 GMT Subject: RFR: 8332119: Incorrect IllegalArgumentException for C2 compiled permute kernel Message-ID: <6IxHpLmCr2e1fKOcbdG38uhJEOsmVUpgVbcGoH4uMnQ=.ac6c99bd-a222-4dbc-a2b2-fdaf1f94a155@github.com> Currently inline expansion of vector to shuffle conversion simply type casts the vector holding indexes to byte vector[1] where as fallback implementation[2] also wraps the indexes to a valid index range [0, VEC_LEN-1) or generates a -ve index for exceptional / OOB indices. This patch extends the conversion inline expander to match the fall back implementation. This imposes around 20% performance tax on Vector.toShuffle() intrinsic but fixes this functional bug. Kindly review and share your feedback. Best Regards, Jatin PS: Patch also fixes an incorrectness issue reported with [JDK-8332118](https://bugs.openjdk.org/browse/JDK-8332118) [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2352 [2] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractShuffle.java#L58 ------------- Commit messages: - 8332119: Incorrect IllegalArgumentException for C2 compiled permute kernel Changes: https://git.openjdk.org/jdk/pull/19442/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19442&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332119 Stats: 147 lines in 3 files changed: 137 ins; 9 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19442.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19442/head:pull/19442 PR: https://git.openjdk.org/jdk/pull/19442 From epeter at openjdk.org Wed May 29 06:34:02 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 29 May 2024 06:34:02 GMT Subject: RFR: 8325155: C2 SuperWord: remove alignment boundaries [v5] In-Reply-To: References: Message-ID: > I have tried for a very long time to get rid of all the `alignment(n)` code that is all over the SuperWord code. With lots of previous work, I am now finally ready to remove it. > > I was able to remove lots of VM code, about 300 lines. And the removed code is I think much more complicated than the new code. > > This is what I did in this PR: > - Removal of `_node_info`: used to have many fields, which I refactored out to the `VLoopAnalyzer` modules. `alignment` is the last component, which I now remove. > - Changed the implementation of `SuperWord::find_adjacent_refs`, now `SuperWord::find_adjacent_memop_pairs`, completely: > - It used to be an algorithm that would scan over all `memops` repeatedly, try to find some `mem_ref` and see which other memops were comparable, and then pack pairs for all of those, by comparing all-vs-all memops. This algorithm is at least quadratic, if not much worse. > - I now add all `memops` into a single array, sort them by groups (those that are comparable with each other and could be packed into vectors), and inside the groups by ascending offset. This allows me to split off the groups much more efficiently, and also the sorting by offset allows me finding adjacent pairs much more efficiently. In the most cases this reduces the cost to `O(n log n)` for sort, and a linear scan for finding adjacent memops. > - I removed the "alignment boundaries" created in `SuperWord::memory_alignment` by `int off_rem = offset % vw;`. > - This used to have the effect that all offsets were computed modulo the vector width. Hence, pairs could not be packed across this boundary (e.g. we have nodes with offsets `31, 32`, which are adjacent in theory, but if we have a `vw = 32`, then the modulo-offsets are `31, 0`, and they are not detected as adjacent). > - These "alignment boundaries" used to be required for correctness about a year ago, before I fixed and relaxed much of the alignment code. > - The `alignment` used to have another important task: Ensuring compatibility of the input-size of a use node, with the output-size of the def-node. > - This was done by giving all nodes an `alignment`, even the non-memop nodes. This `alignment` was then scaled up and down at type casts (e.g. int `0, 4, 8, 12` -> long `0, 8, 16, 24`). If the output-size of the def-node did not match the input-size of the use-node, then the `alignment` would not match up, and we would not pack. > - This is why we used to have checks like `alignment(s1) + data_size(s1) == alignment(s2)` ... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: find_adjacent_memop_pairs -> create_adjacent_memop_pairs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18822/files - new: https://git.openjdk.org/jdk/pull/18822/files/41fc1eb3..cd65ca05 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18822&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18822&range=03-04 Stats: 15 lines in 2 files changed: 2 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/18822.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18822/head:pull/18822 PR: https://git.openjdk.org/jdk/pull/18822 From dlong at openjdk.org Wed May 29 06:34:02 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 29 May 2024 06:34:02 GMT Subject: RFR: 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer In-Reply-To: References: Message-ID: On Tue, 28 May 2024 12:36:40 GMT, Matthias Baesken wrote: > When running on macOS with ubsan enabled, we see some issues in relocInfo (hpp and cpp); those already occur in the build quite early. > > /jdk/src/hotspot/share/code/relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer > > Similar happens when we add to the _current pointer > _current++; > this gives : > relocInfo.hpp:606:13: runtime error: applying non-zero offset to non-null pointer 0xfffffffffffffffe produced null pointer > > Seems the pointer subtraction/addition worked so far, so it might be an option to disable ubsan for those 2 functions. We could consider using a sentinel value instead of nullptr, but it would require more changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19424#issuecomment-2136638858 From mbaesken at openjdk.org Wed May 29 06:52:01 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 29 May 2024 06:52:01 GMT Subject: RFR: 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer In-Reply-To: References: Message-ID: On Tue, 28 May 2024 12:36:40 GMT, Matthias Baesken wrote: > When running on macOS with ubsan enabled, we see some issues in relocInfo (hpp and cpp); those already occur in the build quite early. > > /jdk/src/hotspot/share/code/relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer > > Similar happens when we add to the _current pointer > _current++; > this gives : > relocInfo.hpp:606:13: runtime error: applying non-zero offset to non-null pointer 0xfffffffffffffffe produced null pointer > > Seems the pointer subtraction/addition worked so far, so it might be an option to disable ubsan for those 2 functions. What you think about using some helper templates or macros like this, doing what Martin suggested ? // helper templates to avoid undefined addition/subtraction from nullptr template T* add_to_ptr(T* ptr, int val) { return (T*)((uintptr_t)ptr + val * sizeof(T)); } template T* sub_from_ptr(T* ptr, int val) { return (T*)((uintptr_t)ptr - val * sizeof(T)); } ------------- PR Comment: https://git.openjdk.org/jdk/pull/19424#issuecomment-2136662735 From chagedorn at openjdk.org Wed May 29 06:54:34 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 29 May 2024 06:54:34 GMT Subject: RFR: 8325155: C2 SuperWord: remove alignment boundaries [v5] In-Reply-To: References: Message-ID: On Wed, 29 May 2024 06:34:02 GMT, Emanuel Peter wrote: >> I have tried for a very long time to get rid of all the `alignment(n)` code that is all over the SuperWord code. With lots of previous work, I am now finally ready to remove it. >> >> I was able to remove lots of VM code, about 300 lines. And the removed code is I think much more complicated than the new code. >> >> This is what I did in this PR: >> - Removal of `_node_info`: used to have many fields, which I refactored out to the `VLoopAnalyzer` modules. `alignment` is the last component, which I now remove. >> - Changed the implementation of `SuperWord::find_adjacent_refs`, now `SuperWord::find_adjacent_memop_pairs`, completely: >> - It used to be an algorithm that would scan over all `memops` repeatedly, try to find some `mem_ref` and see which other memops were comparable, and then pack pairs for all of those, by comparing all-vs-all memops. This algorithm is at least quadratic, if not much worse. >> - I now add all `memops` into a single array, sort them by groups (those that are comparable with each other and could be packed into vectors), and inside the groups by ascending offset. This allows me to split off the groups much more efficiently, and also the sorting by offset allows me finding adjacent pairs much more efficiently. In the most cases this reduces the cost to `O(n log n)` for sort, and a linear scan for finding adjacent memops. >> - I removed the "alignment boundaries" created in `SuperWord::memory_alignment` by `int off_rem = offset % vw;`. >> - This used to have the effect that all offsets were computed modulo the vector width. Hence, pairs could not be packed across this boundary (e.g. we have nodes with offsets `31, 32`, which are adjacent in theory, but if we have a `vw = 32`, then the modulo-offsets are `31, 0`, and they are not detected as adjacent). >> - These "alignment boundaries" used to be required for correctness about a year ago, before I fixed and relaxed much of the alignment code. >> - The `alignment` used to have another important task: Ensuring compatibility of the input-size of a use node, with the output-size of the def-node. >> - This was done by giving all nodes an `alignment`, even the non-memop nodes. This `alignment` was then scaled up and down at type casts (e.g. int `0, 4, 8, 12` -> long `0, 8, 16, 24`). If the output-size of the def-node did not match the input-size of the use-node, then the `alignment` would not match up, and we would not pack. >> - This is why we used to have checks like `alignment(s1) + da... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > find_adjacent_memop_pairs -> create_adjacent_memop_pairs Thanks for doing the updates, looks good now! `find` -> `create` is a good solution :-) src/hotspot/share/opto/superword.cpp line 559: > 557: // Find adjacent memops for a single group, e.g. for all LoadI of the same base, invar, etc. > 558: // Create pairs and add them to the pairset. > 559: void SuperWord::create_adjacent_memop_pairs_in_one_group(const GrowableArray& vpointers, const int group_start, int group_end) { `group_end` can also be made const Suggestion: void SuperWord::create_adjacent_memop_pairs_in_one_group(const GrowableArray& vpointers, const int group_start, const int group_end) { ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18822#pullrequestreview-2084449713 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1618297201 From epeter at openjdk.org Wed May 29 07:16:44 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 29 May 2024 07:16:44 GMT Subject: RFR: 8325155: C2 SuperWord: remove alignment boundaries [v6] In-Reply-To: References: Message-ID: > I have tried for a very long time to get rid of all the `alignment(n)` code that is all over the SuperWord code. With lots of previous work, I am now finally ready to remove it. > > I was able to remove lots of VM code, about 300 lines. And the removed code is I think much more complicated than the new code. > > This is what I did in this PR: > - Removal of `_node_info`: used to have many fields, which I refactored out to the `VLoopAnalyzer` modules. `alignment` is the last component, which I now remove. > - Changed the implementation of `SuperWord::find_adjacent_refs`, now `SuperWord::find_adjacent_memop_pairs`, completely: > - It used to be an algorithm that would scan over all `memops` repeatedly, try to find some `mem_ref` and see which other memops were comparable, and then pack pairs for all of those, by comparing all-vs-all memops. This algorithm is at least quadratic, if not much worse. > - I now add all `memops` into a single array, sort them by groups (those that are comparable with each other and could be packed into vectors), and inside the groups by ascending offset. This allows me to split off the groups much more efficiently, and also the sorting by offset allows me finding adjacent pairs much more efficiently. In the most cases this reduces the cost to `O(n log n)` for sort, and a linear scan for finding adjacent memops. > - I removed the "alignment boundaries" created in `SuperWord::memory_alignment` by `int off_rem = offset % vw;`. > - This used to have the effect that all offsets were computed modulo the vector width. Hence, pairs could not be packed across this boundary (e.g. we have nodes with offsets `31, 32`, which are adjacent in theory, but if we have a `vw = 32`, then the modulo-offsets are `31, 0`, and they are not detected as adjacent). > - These "alignment boundaries" used to be required for correctness about a year ago, before I fixed and relaxed much of the alignment code. > - The `alignment` used to have another important task: Ensuring compatibility of the input-size of a use node, with the output-size of the def-node. > - This was done by giving all nodes an `alignment`, even the non-memop nodes. This `alignment` was then scaled up and down at type casts (e.g. int `0, 4, 8, 12` -> long `0, 8, 16, 24`). If the output-size of the def-node did not match the input-size of the use-node, then the `alignment` would not match up, and we would not pack. > - This is why we used to have checks like `alignment(s1) + data_size(s1) == alignment(s2)` ... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/superword.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18822/files - new: https://git.openjdk.org/jdk/pull/18822/files/cd65ca05..d18219b5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18822&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18822&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18822.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18822/head:pull/18822 PR: https://git.openjdk.org/jdk/pull/18822 From aph at openjdk.org Wed May 29 07:39:08 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 29 May 2024 07:39:08 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v2] In-Reply-To: References: Message-ID: On Tue, 28 May 2024 14:04:13 GMT, Martin Doerr wrote: > Performance seems to be not affected by that bug. That is extremely suspicious. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19368#issuecomment-2136738622 From aph at openjdk.org Wed May 29 07:43:06 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 29 May 2024 07:43:06 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v3] In-Reply-To: <0YQF8jE_JFiy_K34aIy6cybUwnpp47-6jrnmZ3jbcAI=.c6663758-17f6-40f8-a738-4e4bf7e9ddaf@github.com> References: <0YQF8jE_JFiy_K34aIy6cybUwnpp47-6jrnmZ3jbcAI=.c6663758-17f6-40f8-a738-4e4bf7e9ddaf@github.com> Message-ID: On Tue, 28 May 2024 14:59:14 GMT, Martin Doerr wrote: >> PPC64 implementation of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). Please review! >> I noticed that `r_array_length` is sometimes 0 and I don't see code for that on x86. Any idea? >> How can we verify it? By comparing the performance using the micro benchmarks? >> >> Micro benchmark results without patch (measured on Power10 with 2*8 hardware threads): >> >> Original >> SecondarySuperCacheHits: 13.033 ?(99.9%) 0.058 ns/op [Average] >> SecondarySuperCacheInterContention.test avgt 15 432.366 ? 8.364 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 432.310 ? 8.460 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 432.422 ? 10.819 ns/op >> SecondarySuperCacheIntraContention.test avgt 15 355.192 ? 3.597 ns/op >> SecondarySupersLookup.testNegative00 avgt 15 12.274 ? 0.026 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 12.300 ? 0.039 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 12.304 ? 0.034 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 12.276 ? 0.050 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 12.235 ? 0.044 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 12.308 ? 0.156 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 12.291 ? 0.048 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 12.307 ? 0.052 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 12.398 ? 0.075 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 12.552 ? 0.122 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 12.490 ? 0.083 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 12.565 ? 0.092 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 19.059 ? 0.958 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 19.268 ? 0.124 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 20.059 ? 0.114 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 25.117 ? 0.368 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 32.735 ? 0.359 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 34.866 ? 0.152 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 35.492 ? 0.276 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 36.620 ? 0.334 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 37.226 ? 0.180 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 37.774 ? 0.241 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 38.627 ? 1.451 ns/op >> SecondarySupersLookup.testNegative61 avgt 15 ... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Adapt assertion. We sometimes have only 1 element in the secondary supers array. src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2201: > 2199: li(result, 1); // failure > 2200: // We test the MSB of r_array_index, i.e. its sign bit > 2201: bgt(CCR0, L_fallthrough); This looks wrong. Should be greater or equal. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19368#discussion_r1618364435 From fyang at openjdk.org Wed May 29 07:43:06 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 29 May 2024 07:43:06 GMT Subject: RFR: 8320999: RISC-V: C2 RotateLeftV [v4] In-Reply-To: References: Message-ID: <0vYGBwpVIOa8vxa5UtU7OYUd2yEB7jG1uUZvYTB-_40=.7745648b-1e74-40c0-9815-c809a79b8b9d@github.com> On Tue, 28 May 2024 09:06:14 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> More detailed description is inline in the code. >> Thanks > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > add reg version Two minor comments remain. Otherwise looks good to me. BTW: You didn't mention the testing performed. Are these newly-added instructs properly test covered? Thanks. src/hotspot/cpu/riscv/riscv_v.ad line 3082: > 3080: // > 3081: // NOTE: for Long, its valid rotation value is 6 bits, although basic vector instruction only support 5 bit vector-immediate, > 3082: // in Zvbb, vror.vi support 6 bits vector-immediate, so the imm implementation of Long and other types can be unified. Maybe simply: `As vror.vi encodes 6-bits immediate rotate amount, which is different from other vector-immediate instructions, implementation of vector rotation for long and other types can be unified.` src/hotspot/cpu/riscv/riscv_v.ad line 3130: > 3128: instruct vrotate_right_masked(vReg dst_src, vReg shift, vRegMask_V0 v0) %{ > 3129: match(Set dst_src (RotateRightV (Binary dst_src shift) v0)); > 3130: effect(TEMP_DEF dst_src); Is the `TEMP_DEF dst_src` needed for these newly-added masked versions? ------------- PR Review: https://git.openjdk.org/jdk/pull/19325#pullrequestreview-2084533121 PR Review Comment: https://git.openjdk.org/jdk/pull/19325#discussion_r1618349791 PR Review Comment: https://git.openjdk.org/jdk/pull/19325#discussion_r1618361073 From stefank at openjdk.org Wed May 29 07:49:04 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 29 May 2024 07:49:04 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v10] In-Reply-To: References: <-RRlrDdRqiN1sxsQF7RYJIl8W6Z62LcAq8quEalrzjc=.f6ae63e5-92d9-41be-962b-e2741c676b32@github.com> Message-ID: On Tue, 28 May 2024 15:52:26 GMT, Andrew Haley wrote: >> src/hotspot/share/asm/register.hpp line 263: >> >>> 261: template >>> 262: inline constexpr bool different_registers(AbstractRegSet allocated_regs, R first_register, Rx... more_registers) { >>> 263: if (allocated_regs.contains(first_register)) { >> >> FWIW, while first reading this I was looking for the base case of the recursion (the previous versions had some extra specializations). To me it looks like the base case is written in both this function and the function above. I would prefer to have the implementation inside one function only and change this function to use: >> >> if (!different_registers(allocated_regs, first_register)) { >> >> I think this could make it a bit clearer, but if you prefer the current style, I think that's fine as well. > > I'd prefer to stick with what I have, because it's a bit more direct and slightly simpler runtime code. Sure. We can't all have the seem style preferences. :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16617#discussion_r1618376956 From jbhateja at openjdk.org Wed May 29 07:54:09 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 29 May 2024 07:54:09 GMT Subject: RFR: 8332487: Regression in Crypto-AESGCMBench.encrypt (and others) after JDK-8328181 Message-ID: Re-instantiating the ClearArray opcode check in match_rule_supported_vector, this caused performance regressions in some worklets in Renaissance BM since it prevented small sized instance initialization using quadword stores which showed better performance on non-AVX512 targets. Our intent was to save code bloating due to long sequences of quadword store with large InitArrayShortSize value to prevent any side effects on in-lining decisions. Performance of an existing [Benchmark](https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/vm/compiler/ClearMemory.java) does not show much performance variation. Baseline with -XX:InitArrayShortSize=100000000 Benchmark Mode Cnt Score Error Units ClearMemory.testClearMemory16K thrpt 2 2695259.360 ops/s ClearMemory.testClearMemory1K thrpt 2 48622330.474 ops/s ClearMemory.testClearMemory1M thrpt 2 79546.779 ops/s ClearMemory.testClearMemory24B thrpt 2 252740278.617 ops/s ClearMemory.testClearMemory2K thrpt 2 24781443.547 ops/s ClearMemory.testClearMemory32B thrpt 2 251588987.342 ops/s ClearMemory.testClearMemory32K thrpt 2 1487427.378 ops/s ClearMemory.testClearMemory40B thrpt 2 213856093.091 ops/s ClearMemory.testClearMemory48B thrpt 2 193701317.101 ops/s ClearMemory.testClearMemory4K thrpt 2 11961450.919 ops/s ClearMemory.testClearMemory56B thrpt 2 169003238.018 ops/s ClearMemory.testClearMemory8K thrpt 2 5871416.239 ops/s ClearMemory.testClearMemory8M thrpt 2 10663.044 ops/s With patch and -XX:InitArrayShortSize=100000000 Benchmark Mode Cnt Score Error Units ClearMemory.testClearMemory16K thrpt 2 3147203.987 ops/s ClearMemory.testClearMemory1K thrpt 2 48225184.981 ops/s ClearMemory.testClearMemory1M thrpt 2 80016.400 ops/s ClearMemory.testClearMemory24B thrpt 2 253904943.981 ops/s ClearMemory.testClearMemory2K thrpt 2 24664594.490 ops/s ClearMemory.testClearMemory32B thrpt 2 255507231.954 ops/s ClearMemory.testClearMemory32K thrpt 2 1636220.531 ops/s ClearMemory.testClearMemory40B thrpt 2 220718255.832 ops/s ClearMemory.testClearMemory48B thrpt 2 196294911.715 ops/s ClearMemory.testClearMemory4K thrpt 2 12182133.488 ops/s ClearMemory.testClearMemory56B thrpt 2 168341797.370 ops/s ClearMemory.testClearMemory8K thrpt 2 5952488.407 ops/s ClearMemory.testClearMemory8M thrpt 2 10604.212 ops/s java -jar renaissance-jmh-0.15.0-18-g65d596e-SNAPSHOT.jar -jvmArgs "-Xms10g -Xmx10g -Xlog:alloc* -XX:+UnlockDiagnosticVMOptions" -f 1 -i 5 -wi 2 -w 30 org.renaissance.jdk.streams.JmhMnemonics.run Baseline:- Benchmark Mode Cnt Score Error Units JmhMnemonics.run ss 5 3202.478 ? 20.474 ms/op With Patch:- Benchmark Mode Cnt Score Error Units JmhMnemonics.run ss 5 3241.956 ? 17.290 ms/op Kindly review and share feedback. Best Regards, Jatin ------------- Commit messages: - 8332487: Regression in Crypto-AESGCMBench.encrypt (and others) after JDK-8328181 Changes: https://git.openjdk.org/jdk/pull/19447/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19447&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332487 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19447.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19447/head:pull/19447 PR: https://git.openjdk.org/jdk/pull/19447 From mdoerr at openjdk.org Wed May 29 08:14:29 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 29 May 2024 08:14:29 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v4] In-Reply-To: References: Message-ID: > PPC64 implementation of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). Please review! > I noticed that `r_array_length` is sometimes 0 and I don't see code for that on x86. Any idea? > How can we verify it? By comparing the performance using the micro benchmarks? > > Micro benchmark results without patch (measured on Power10 with 2*8 hardware threads): > > Original > SecondarySuperCacheHits: 13.033 ?(99.9%) 0.058 ns/op [Average] > SecondarySuperCacheInterContention.test avgt 15 432.366 ? 8.364 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 432.310 ? 8.460 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 432.422 ? 10.819 ns/op > SecondarySuperCacheIntraContention.test avgt 15 355.192 ? 3.597 ns/op > SecondarySupersLookup.testNegative00 avgt 15 12.274 ? 0.026 ns/op > SecondarySupersLookup.testNegative01 avgt 15 12.300 ? 0.039 ns/op > SecondarySupersLookup.testNegative02 avgt 15 12.304 ? 0.034 ns/op > SecondarySupersLookup.testNegative03 avgt 15 12.276 ? 0.050 ns/op > SecondarySupersLookup.testNegative04 avgt 15 12.235 ? 0.044 ns/op > SecondarySupersLookup.testNegative05 avgt 15 12.308 ? 0.156 ns/op > SecondarySupersLookup.testNegative06 avgt 15 12.291 ? 0.048 ns/op > SecondarySupersLookup.testNegative07 avgt 15 12.307 ? 0.052 ns/op > SecondarySupersLookup.testNegative08 avgt 15 12.398 ? 0.075 ns/op > SecondarySupersLookup.testNegative09 avgt 15 12.552 ? 0.122 ns/op > SecondarySupersLookup.testNegative10 avgt 15 12.490 ? 0.083 ns/op > SecondarySupersLookup.testNegative16 avgt 15 12.565 ? 0.092 ns/op > SecondarySupersLookup.testNegative20 avgt 15 19.059 ? 0.958 ns/op > SecondarySupersLookup.testNegative30 avgt 15 19.268 ? 0.124 ns/op > SecondarySupersLookup.testNegative32 avgt 15 20.059 ? 0.114 ns/op > SecondarySupersLookup.testNegative40 avgt 15 25.117 ? 0.368 ns/op > SecondarySupersLookup.testNegative50 avgt 15 32.735 ? 0.359 ns/op > SecondarySupersLookup.testNegative55 avgt 15 34.866 ? 0.152 ns/op > SecondarySupersLookup.testNegative56 avgt 15 35.492 ? 0.276 ns/op > SecondarySupersLookup.testNegative57 avgt 15 36.620 ? 0.334 ns/op > SecondarySupersLookup.testNegative58 avgt 15 37.226 ? 0.180 ns/op > SecondarySupersLookup.testNegative59 avgt 15 37.774 ? 0.241 ns/op > SecondarySupersLookup.testNegative60 avgt 15 38.627 ? 1.451 ns/op > SecondarySupersLookup.testNegative61 avgt 15 39.395 ? 0.249 ns/op > SecondarySupersLookup.testNegative62 avgt 15 ... Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Fix check for sign bit. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19368/files - new: https://git.openjdk.org/jdk/pull/19368/files/c1840719..14fc650f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19368&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19368&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19368.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19368/head:pull/19368 PR: https://git.openjdk.org/jdk/pull/19368 From mdoerr at openjdk.org Wed May 29 08:14:29 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 29 May 2024 08:14:29 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v3] In-Reply-To: References: <0YQF8jE_JFiy_K34aIy6cybUwnpp47-6jrnmZ3jbcAI=.c6663758-17f6-40f8-a738-4e4bf7e9ddaf@github.com> Message-ID: On Wed, 29 May 2024 07:40:21 GMT, Andrew Haley wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Adapt assertion. We sometimes have only 1 element in the secondary supers array. > > src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2201: > >> 2199: li(result, 1); // failure >> 2200: // We test the MSB of r_array_index, i.e. its sign bit >> 2201: bgt(CCR0, L_fallthrough); > > This looks wrong. Should be greater or equal. Right. Fixed. Thank you! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19368#discussion_r1618416597 From mdoerr at openjdk.org Wed May 29 08:17:01 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 29 May 2024 08:17:01 GMT Subject: RFR: 8332904: ubsan ppc64le: c1_LIRGenerator_ppc.cpp:581:21: runtime error: signed integer overflow: 9223372036854775807 + 1 cannot be represented in type 'long int' In-Reply-To: <48iXq_MgcuQTyGGANYnJQlEJwBhmRsiX9SCtRNtQHbQ=.baf4713c-fbcf-4bb1-bb18-5af9b3bab57f@github.com> References: <48iXq_MgcuQTyGGANYnJQlEJwBhmRsiX9SCtRNtQHbQ=.baf4713c-fbcf-4bb1-bb18-5af9b3bab57f@github.com> Message-ID: <9R25QOEVTwQFDqyF8FZMA6UPmgZGkacw2HhgNC3UzYQ=.2bdd8542-53eb-4ab1-a664-545743676c4f@github.com> On Mon, 27 May 2024 14:48:45 GMT, Matthias Baesken wrote: > When using ubsan on Linux ppc64le we run into some overflows like this one > > c1_LIRGenerator_ppc.cpp:581:21: runtime error: signed integer overflow: 9223372036854775807 + 1 cannot be represented in type 'long int' > > Seems we have to add casts to get defined behavior. > There are similar places in the coding as well. Thank you for fixing it! ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19413#pullrequestreview-2084643839 From mbaesken at openjdk.org Wed May 29 08:24:02 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 29 May 2024 08:24:02 GMT Subject: RFR: 8332904: ubsan ppc64le: c1_LIRGenerator_ppc.cpp:581:21: runtime error: signed integer overflow: 9223372036854775807 + 1 cannot be represented in type 'long int' In-Reply-To: <48iXq_MgcuQTyGGANYnJQlEJwBhmRsiX9SCtRNtQHbQ=.baf4713c-fbcf-4bb1-bb18-5af9b3bab57f@github.com> References: <48iXq_MgcuQTyGGANYnJQlEJwBhmRsiX9SCtRNtQHbQ=.baf4713c-fbcf-4bb1-bb18-5af9b3bab57f@github.com> Message-ID: On Mon, 27 May 2024 14:48:45 GMT, Matthias Baesken wrote: > When using ubsan on Linux ppc64le we run into some overflows like this one > > c1_LIRGenerator_ppc.cpp:581:21: runtime error: signed integer overflow: 9223372036854775807 + 1 cannot be represented in type 'long int' > > Seems we have to add casts to get defined behavior. > There are similar places in the coding as well. Hi Martin, thanks for the review ! May I get a second review ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19413#issuecomment-2136826402 From gcao at openjdk.org Wed May 29 08:28:09 2024 From: gcao at openjdk.org (Gui Cao) Date: Wed, 29 May 2024 08:28:09 GMT Subject: RFR: 8333154: RISC-V: Add support for primitive array C1 clone intrinsic Message-ID: Implementation of primitive array C1 clone intrinsic (https://bugs.openjdk.org/browse/JDK-8333154) for linux-riscv64. ### Correctness testing: - [x] Run `make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" ` - [ ] Run tier1-3 tests on SOPHON SG2042 (release) ### Performance testing: Without Patch: make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" Benchmark (size) Mode Cnt Score Error Units ArrayClone.byteArraycopy 0 avgt 15 90.089 ? 7.122 ns/op ArrayClone.byteArraycopy 10 avgt 15 146.000 ? 11.761 ns/op ArrayClone.byteArraycopy 100 avgt 15 289.382 ? 23.903 ns/op ArrayClone.byteArraycopy 1000 avgt 15 767.864 ? 56.721 ns/op ArrayClone.byteClone 0 avgt 15 735.692 ? 26.641 ns/op ArrayClone.byteClone 10 avgt 15 810.810 ? 34.563 ns/op ArrayClone.byteClone 100 avgt 15 1055.917 ? 93.574 ns/op ArrayClone.byteClone 1000 avgt 15 1564.465 ? 140.941 ns/op ArrayClone.intArraycopy 0 avgt 15 93.732 ? 8.468 ns/op ArrayClone.intArraycopy 10 avgt 15 214.168 ? 34.526 ns/op ArrayClone.intArraycopy 100 avgt 15 613.363 ? 45.415 ns/op ArrayClone.intArraycopy 1000 avgt 15 1759.611 ? 59.010 ns/op ArrayClone.intClone 0 avgt 15 680.100 ? 24.375 ns/op ArrayClone.intClone 10 avgt 15 835.979 ? 75.154 ns/op ArrayClone.intClone 100 avgt 15 1337.354 ? 86.182 ns/op ArrayClone.intClone 1000 avgt 15 2696.280 ? 207.418 ns/op Finished running test 'micro:java.lang.ArrayClone' With Patch: make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" Benchmark (size) Mode Cnt Score Error Units ArrayClone.byteArraycopy 0 avgt 15 89.410 ? 5.112 ns/op ArrayClone.byteArraycopy 10 avgt 15 141.125 ? 8.711 ns/op ArrayClone.byteArraycopy 100 avgt 15 277.098 ? 12.925 ns/op ArrayClone.byteArraycopy 1000 avgt 15 770.188 ? 83.034 ns/op ArrayClone.byteClone 0 avgt 15 94.367 ? 7.088 ns/op ArrayClone.byteClone 10 avgt 15 151.804 ? 16.497 ns/op ArrayClone.byteClone 100 avgt 15 296.284 ? 17.893 ns/op ArrayClone.byteClone 1000 avgt 15 790.517 ? 28.765 ns/op ArrayClone.intArraycopy 0 avgt 15 93.688 ? 7.050 ns/op ArrayClone.intArraycopy 10 avgt 15 213.070 ? 12.299 ns/op ArrayClone.intArraycopy 100 avgt 15 610.022 ? 57.880 ns/op ArrayClone.intArraycopy 1000 avgt 15 1774.649 ? 89.373 ns/op ArrayClone.intClone 0 avgt 15 96.396 ? 7.751 ns/op ArrayClone.intClone 10 avgt 15 216.224 ? 19.219 ns/op ArrayClone.intClone 100 avgt 15 585.263 ? 31.196 ns/op ArrayClone.intClone 1000 avgt 15 1559.673 ? 108.939 ns/op Finished running test 'micro:java.lang.ArrayClone' ------------- Commit messages: - 8333154: RISC-V: Add support for primitive array C1 clone intrinsic Changes: https://git.openjdk.org/jdk/pull/19448/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19448&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333154 Stats: 31 lines in 7 files changed: 21 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/19448.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19448/head:pull/19448 PR: https://git.openjdk.org/jdk/pull/19448 From redestad at openjdk.org Wed May 29 08:39:01 2024 From: redestad at openjdk.org (Claes Redestad) Date: Wed, 29 May 2024 08:39:01 GMT Subject: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier [v2] In-Reply-To: References: <1tx_tl3PV2W5NCEXXawQY5V2ndnSOHPfjisypuhKdhA=.79840096-bac0-4da4-8102-c7ecea7cb5f0@github.com> Message-ID: On Wed, 29 May 2024 03:16:02 GMT, Chen Liang wrote: >> In fact, if I change it to `2`, the following tests will fail: >> >> - `jdk/jdk/classfile/Utf8EntryTest.java` >> - `jdk/java/util/zip/ZipCoding.java` >> - `jdk/java/text/Format/MessageFormat/MessageRegression.java` > > Indeed, the actual length passed at call site is `value.length >> 1` instead of `value.length`; this adjusted char-length carries on to `vectorizedHashCode` call. Ah, sneaky. Might affect scores for the zero and one-char cases since the shift now happens unconditionally, but probably doesn't matter in practice. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19414#discussion_r1618456668 From redestad at openjdk.org Wed May 29 08:43:01 2024 From: redestad at openjdk.org (Claes Redestad) Date: Wed, 29 May 2024 08:43:01 GMT Subject: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier [v2] In-Reply-To: <1tx_tl3PV2W5NCEXXawQY5V2ndnSOHPfjisypuhKdhA=.79840096-bac0-4da4-8102-c7ecea7cb5f0@github.com> References: <1tx_tl3PV2W5NCEXXawQY5V2ndnSOHPfjisypuhKdhA=.79840096-bac0-4da4-8102-c7ecea7cb5f0@github.com> Message-ID: On Mon, 27 May 2024 20:55:29 GMT, Pavel Rappo wrote: >> Please review this PR, which supersedes a now withdrawn https://github.com/openjdk/jdk/pull/14831. >> >> This PR replaces `ArraysSupport.vectorizedHashCode` with a set of more user-friendly methods. Here's a summary: >> >> - Made the operand constants (i.e. `T_BOOLEAN` and friends) and the `vectorizedHashCode` method private >> >> - Made the `vectorizedHashCode` method private, but didn't rename it. Renaming would dramatically increase this PR review cost, because that method's name is used by a lot of VM code. On a bright side, since the method is now private, it's no longer callable by clients of `ArraysSupport`, thus a problem of an inaccurate name is less severe. >> >> - Made the `ArraysSupport.utf16HashCode` method private >> >> - Moved tiny cases (i.e. 0, 1, 2) to `ArraysSupport` > > Pavel Rappo has updated the pull request incrementally with one additional commit since the last revision: > > Fix incorrect utf16 hashCode adaptation Marked as reviewed by redestad (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19414#pullrequestreview-2084714609 From fgao at openjdk.org Wed May 29 08:46:51 2024 From: fgao at openjdk.org (Fei Gao) Date: Wed, 29 May 2024 08:46:51 GMT Subject: RFR: 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" [v3] In-Reply-To: <16J-lJ2AceGTVcRWBcP15yKcwO-1IA1XsngyOuNjf7k=.0776f081-ae2c-4279-87cf-d909806c2bc4@github.com> References: <16J-lJ2AceGTVcRWBcP15yKcwO-1IA1XsngyOuNjf7k=.0776f081-ae2c-4279-87cf-d909806c2bc4@github.com> Message-ID: > On LP64 systems, if the heap can be moved into low virtual address space (below 4GB) and the heap size is smaller than the interesting threshold of 4 GB, we can use unscaled decoding pattern for narrow klass decoding. It means that a generic field reference can be decoded by: > > cast<64> (32-bit compressed reference) + field_offset > > > When the `field_offset` is an immediate, on aarch64 platform, the unscaled decoding pattern can match perfectly with a direct addressing mode, i.e., `base_plus_offset`, supported by `LDR/STR` instructions. But for certain data width, not all immediates can be encoded in the instruction field of `LDR/STR` [[1]](https://github.com/openjdk/jdk/blob/8db7bad992a0f31de9c7e00c2657c18670539102/src/hotspot/cpu/aarch64/assembler_aarch64.inline.hpp#L33). The ranges are different as data widths vary. > > For example, when we try to load a value of long type at offset of `1030`, the address expression is `(AddP (DecodeN base) 1030)`. Before the patch, the expression was matching with `operand indOffIN()`. But, for 64-bit `LDR/STR`, signed immediate byte offset must be in the range -256 to 255 or positive immediate byte offset must be a multiple of 8 in the range 0 to 32760 [[2]](https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/LDR--immediate---Load-Register--immediate--?lang=en). `1030` can't be encoded in the instruction field. So, after matching, when we do checking for instruction encoding, the assertion would fail. > > In this patch, we're going to filter out invalid immediates when deciding if current addressing mode can be matched as `base_plus_offset`. We introduce `indOffIN4/indOffLN4` and `indOffIN8/indOffLN8` for 32-bit data type and 64-bit data type separately in the patch. E.g., for `memory4`, we remove the generic `indOffIN/indOffLN`, which matches wrong unscaled immediate range, and replace them with `indOffIN4/indOffLN4` instead. > > Since 8-bit and 16-bit `LDR/STR` instructions also support the unscaled decoding pattern, we add the addressing mode in the lists of `memory1` and `memory2` by introducing `indOffIN1/indOffLN1` and `indOffIN2/indOffLN2`. > > We also remove unused operands `indOffI/indOffl/indOffIN/indOffLN` to avoid misuse. > > Tier 1-3 passed on aarch64. Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Add the assertion back and merge matchrules with a better predicate - Merge branch 'master' into fg8319690 - Remove unused immIOffset/immLOffset - Merge branch 'master' into fg8319690 - 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" On LP64 systems, if the heap can be moved into low virtual address space (below 4GB) and the heap size is smaller than the interesting threshold of 4 GB, we can use unscaled decoding pattern for narrow klass decoding. It means that a generic field reference can be decoded by: ``` cast<64> (32-bit compressed reference) + field_offset ``` When the `field_offset` is an immediate, on aarch64 platform, the unscaled decoding pattern can match perfectly with a direct addressing mode, i.e., `base_plus_offset`, supported by LDR/STR instructions. But for certain data width, not all immediates can be encoded in the instruction field of LDR/STR[1]. The ranges are different as data widths vary. For example, when we try to load a value of long type at offset of `1030`, the address expression is `(AddP (DecodeN base) 1030)`. Before the patch, the expression was matching with `operand indOffIN()`. But, for 64-bit LDR/STR, signed immediate byte offset must be in the range -256 to 255 or positive immediate byte offset must be a multiple of 8 in the range 0 to 32760[2]. `1030` can't be encoded in the instruction field. So, after matching, when we do checking for instruction encoding, the assertion would fail. In this patch, we're going to filter out invalid immediates when deciding if current addressing mode can be matched as `base_plus_offset`. We introduce `indOffIN4/indOffLN4` and `indOffIN8/indOffLN8` for 32-bit data type and 64-bit data type separately in the patch. E.g., for `memory4`, we remove the generic `indOffIN/indOffLN`, which matches wrong unscaled immediate range, and replace them with `indOffIN4/indOffLN4` instead. Since 8-bit and 16-bit LDR/STR instructions also support the unscaled decoding pattern, we add the addressing mode in the lists of `memory1` and `memory2` by introducing `indOffIN1/indOffLN1` and `indOffIN2/indOffLN2`. We also remove unused operands `indOffI/indOffl/indOffIN/indOffLN` to avoid misuse. Tier 1-3 passed on aarch64. [1] https://github.com/openjdk/jdk/blob/8db7bad992a0f31de9c7e00c2657c18670539102/src/hotspot/cpu/aarch64/assembler_aarch64.inline.hpp#L33 [2] https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/LDR--immediate---Load-Register--immediate--?lang=en ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16991/files - new: https://git.openjdk.org/jdk/pull/16991/files/a7bfe267..0d8a4c12 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16991&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16991&range=01-02 Stats: 794132 lines in 10868 files changed: 224722 ins; 193799 del; 375611 mod Patch: https://git.openjdk.org/jdk/pull/16991.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16991/head:pull/16991 PR: https://git.openjdk.org/jdk/pull/16991 From fgao at openjdk.org Wed May 29 09:01:03 2024 From: fgao at openjdk.org (Fei Gao) Date: Wed, 29 May 2024 09:01:03 GMT Subject: RFR: 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" [v2] In-Reply-To: References: <16J-lJ2AceGTVcRWBcP15yKcwO-1IA1XsngyOuNjf7k=.0776f081-ae2c-4279-87cf-d909806c2bc4@github.com> Message-ID: On Sat, 24 Feb 2024 09:17:04 GMT, Andrew Haley wrote: >> @theRealAph thanks for your kind review. All comments inspired me a lot and helped me think more. Also, thanks for your help on fixing it! > >> @fg1417 are you still working on this? > > The best thing to do now is to remove the failing assertion. Thanks for your comments, @theRealAph @dean-long @eme64 . In the new commit, I add the assertion back, which was removed temporarily in https://github.com/openjdk/jdk/commit/98f0b86641d. Also, as @dean-long suggested before, I fold the new `indOffIN` into the existing `indOffI` with a predicate to reduce duplicate code. To make the predicate work well, the new commit also applies some changes to the shared "adlc" part to insert necessary `()` for predicate in an `Operand`. After that, multiple arguments can be chained with logical `&&`, like we did for an `Instruction`. Tier 1 - 3 passed on `x86` and `aarch64`. Please help review. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16991#issuecomment-2136896213 From jkern at openjdk.org Wed May 29 09:06:02 2024 From: jkern at openjdk.org (Joachim Kern) Date: Wed, 29 May 2024 09:06:02 GMT Subject: RFR: 8332904: ubsan ppc64le: c1_LIRGenerator_ppc.cpp:581:21: runtime error: signed integer overflow: 9223372036854775807 + 1 cannot be represented in type 'long int' In-Reply-To: <48iXq_MgcuQTyGGANYnJQlEJwBhmRsiX9SCtRNtQHbQ=.baf4713c-fbcf-4bb1-bb18-5af9b3bab57f@github.com> References: <48iXq_MgcuQTyGGANYnJQlEJwBhmRsiX9SCtRNtQHbQ=.baf4713c-fbcf-4bb1-bb18-5af9b3bab57f@github.com> Message-ID: On Mon, 27 May 2024 14:48:45 GMT, Matthias Baesken wrote: > When using ubsan on Linux ppc64le we run into some overflows like this one > > c1_LIRGenerator_ppc.cpp:581:21: runtime error: signed integer overflow: 9223372036854775807 + 1 cannot be represented in type 'long int' > > Seems we have to add casts to get defined behavior. > There are similar places in the coding as well. LGTM Marked as reviewed by jkern (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/19413#pullrequestreview-2084783932 PR Review: https://git.openjdk.org/jdk/pull/19413#pullrequestreview-2084785696 From dfenacci at openjdk.org Wed May 29 09:06:09 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 29 May 2024 09:06:09 GMT Subject: RFR: 8333099: Missing check for is_LoadVector in StoreNode::Identity Message-ID: [JDK-8325520](https://bugs.openjdk.org/browse/JDK-8325520) introduced a check for type equality in `StoreNode::Identity` in the specific case of a load vector followed by a store vector. Unfortunately the memory node operand might actually not be of type `LoadVector`. So, before retrieving its type, a check for `is_LoadVector` is necessary. ------------- Commit messages: - JDK-833099: missing check for is_LoadVector in StoreNode::Identity Changes: https://git.openjdk.org/jdk/pull/19449/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19449&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333099 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19449.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19449/head:pull/19449 PR: https://git.openjdk.org/jdk/pull/19449 From chagedorn at openjdk.org Wed May 29 09:13:10 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 29 May 2024 09:13:10 GMT Subject: RFR: 8333099: Missing check for is_LoadVector in StoreNode::Identity In-Reply-To: References: Message-ID: On Wed, 29 May 2024 08:54:35 GMT, Damon Fenacci wrote: > [JDK-8325520](https://bugs.openjdk.org/browse/JDK-8325520) introduced a check for type equality in `StoreNode::Identity` in the specific case of a load vector followed by a store vector. > Unfortunately the memory node operand might actually not be of type `LoadVector`. So, before retrieving its type, a check for `is_LoadVector` is necessary. Looks good! Can you also add a regression test? ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19449#pullrequestreview-2084803593 From mbaesken at openjdk.org Wed May 29 09:13:13 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 29 May 2024 09:13:13 GMT Subject: RFR: 8332904: ubsan ppc64le: c1_LIRGenerator_ppc.cpp:581:21: runtime error: signed integer overflow: 9223372036854775807 + 1 cannot be represented in type 'long int' In-Reply-To: <48iXq_MgcuQTyGGANYnJQlEJwBhmRsiX9SCtRNtQHbQ=.baf4713c-fbcf-4bb1-bb18-5af9b3bab57f@github.com> References: <48iXq_MgcuQTyGGANYnJQlEJwBhmRsiX9SCtRNtQHbQ=.baf4713c-fbcf-4bb1-bb18-5af9b3bab57f@github.com> Message-ID: On Mon, 27 May 2024 14:48:45 GMT, Matthias Baesken wrote: > When using ubsan on Linux ppc64le we run into some overflows like this one > > c1_LIRGenerator_ppc.cpp:581:21: runtime error: signed integer overflow: 9223372036854775807 + 1 cannot be represented in type 'long int' > > Seems we have to add casts to get defined behavior. > There are similar places in the coding as well. Thanks for the reviews ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19413#issuecomment-2136925496 From mbaesken at openjdk.org Wed May 29 09:13:13 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 29 May 2024 09:13:13 GMT Subject: Integrated: 8332904: ubsan ppc64le: c1_LIRGenerator_ppc.cpp:581:21: runtime error: signed integer overflow: 9223372036854775807 + 1 cannot be represented in type 'long int' In-Reply-To: <48iXq_MgcuQTyGGANYnJQlEJwBhmRsiX9SCtRNtQHbQ=.baf4713c-fbcf-4bb1-bb18-5af9b3bab57f@github.com> References: <48iXq_MgcuQTyGGANYnJQlEJwBhmRsiX9SCtRNtQHbQ=.baf4713c-fbcf-4bb1-bb18-5af9b3bab57f@github.com> Message-ID: On Mon, 27 May 2024 14:48:45 GMT, Matthias Baesken wrote: > When using ubsan on Linux ppc64le we run into some overflows like this one > > c1_LIRGenerator_ppc.cpp:581:21: runtime error: signed integer overflow: 9223372036854775807 + 1 cannot be represented in type 'long int' > > Seems we have to add casts to get defined behavior. > There are similar places in the coding as well. This pull request has now been integrated. Changeset: 9b64ece5 Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/9b64ece514cf941ebc727991d97c43453d8a488d Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod 8332904: ubsan ppc64le: c1_LIRGenerator_ppc.cpp:581:21: runtime error: signed integer overflow: 9223372036854775807 + 1 cannot be represented in type 'long int' Reviewed-by: mdoerr, jkern ------------- PR: https://git.openjdk.org/jdk/pull/19413 From prappo at openjdk.org Wed May 29 09:21:02 2024 From: prappo at openjdk.org (Pavel Rappo) Date: Wed, 29 May 2024 09:21:02 GMT Subject: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier [v2] In-Reply-To: References: <1tx_tl3PV2W5NCEXXawQY5V2ndnSOHPfjisypuhKdhA=.79840096-bac0-4da4-8102-c7ecea7cb5f0@github.com> Message-ID: On Tue, 28 May 2024 20:21:34 GMT, Claes Redestad wrote: >> test/hotspot/jtreg/compiler/intrinsics/TestArraysHashCode.java line 88: >> >>> 86: private static int testIntrinsic(byte[] bytes, int type) >>> 87: throws InvocationTargetException, IllegalAccessException { >>> 88: return (int) vectorizedHashCode.invoke(null, bytes, 0, 256, 1, type); >> >> Better to just call `hashCodeOfUnsigned` here I think. >> >> The test for the non-constant type could be dropped. That is no longer a part of the 'API' of `ArraySupport`. It looks like the intrinsic bails out when the basic type is not constant any ways: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/library_call.cpp#L6401-L6404 > > The non-constant test was added because that very bailout caused a crash. The other test is actually less interesting since it'll likely be covered indirectly by regular use. But as we are hiding these away this gets ever more obscure and perhaps the test could be dropped entirely. @cl4es, do you want me to delete that test file altogether? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19414#discussion_r1618536122 From chagedorn at openjdk.org Wed May 29 09:21:34 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 29 May 2024 09:21:34 GMT Subject: RFR: 8325155: C2 SuperWord: remove alignment boundaries [v6] In-Reply-To: References: Message-ID: <_usou5aJb--Azf87Jzu9pATR_JDc7JeSe4PhlwnVtHw=.e6d64f25-0e81-4d75-9351-813c0dc9c647@github.com> On Wed, 29 May 2024 07:16:44 GMT, Emanuel Peter wrote: >> I have tried for a very long time to get rid of all the `alignment(n)` code that is all over the SuperWord code. With lots of previous work, I am now finally ready to remove it. >> >> I was able to remove lots of VM code, about 300 lines. And the removed code is I think much more complicated than the new code. >> >> This is what I did in this PR: >> - Removal of `_node_info`: used to have many fields, which I refactored out to the `VLoopAnalyzer` modules. `alignment` is the last component, which I now remove. >> - Changed the implementation of `SuperWord::find_adjacent_refs`, now `SuperWord::find_adjacent_memop_pairs`, completely: >> - It used to be an algorithm that would scan over all `memops` repeatedly, try to find some `mem_ref` and see which other memops were comparable, and then pack pairs for all of those, by comparing all-vs-all memops. This algorithm is at least quadratic, if not much worse. >> - I now add all `memops` into a single array, sort them by groups (those that are comparable with each other and could be packed into vectors), and inside the groups by ascending offset. This allows me to split off the groups much more efficiently, and also the sorting by offset allows me finding adjacent pairs much more efficiently. In the most cases this reduces the cost to `O(n log n)` for sort, and a linear scan for finding adjacent memops. >> - I removed the "alignment boundaries" created in `SuperWord::memory_alignment` by `int off_rem = offset % vw;`. >> - This used to have the effect that all offsets were computed modulo the vector width. Hence, pairs could not be packed across this boundary (e.g. we have nodes with offsets `31, 32`, which are adjacent in theory, but if we have a `vw = 32`, then the modulo-offsets are `31, 0`, and they are not detected as adjacent). >> - These "alignment boundaries" used to be required for correctness about a year ago, before I fixed and relaxed much of the alignment code. >> - The `alignment` used to have another important task: Ensuring compatibility of the input-size of a use node, with the output-size of the def-node. >> - This was done by giving all nodes an `alignment`, even the non-memop nodes. This `alignment` was then scaled up and down at type casts (e.g. int `0, 4, 8, 12` -> long `0, 8, 16, 24`). If the output-size of the def-node did not match the input-size of the use-node, then the `alignment` would not match up, and we would not pack. >> - This is why we used to have checks like `alignment(s1) + da... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/superword.cpp > > Co-authored-by: Christian Hagedorn Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18822#pullrequestreview-2084819601 From mdoerr at openjdk.org Wed May 29 09:22:01 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 29 May 2024 09:22:01 GMT Subject: RFR: 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer In-Reply-To: References: Message-ID: On Tue, 28 May 2024 12:36:40 GMT, Matthias Baesken wrote: > When running on macOS with ubsan enabled, we see some issues in relocInfo (hpp and cpp); those already occur in the build quite early. > > /jdk/src/hotspot/share/code/relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer > > Similar happens when we add to the _current pointer > _current++; > this gives : > relocInfo.hpp:606:13: runtime error: applying non-zero offset to non-null pointer 0xfffffffffffffffe produced null pointer > > Seems the pointer subtraction/addition worked so far, so it might be an option to disable ubsan for those 2 functions. `val` needs an unsigned type to avoid undefined behavior because of signed integer overflow. I'd use `uintptr_t`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19424#issuecomment-2136944166 From mli at openjdk.org Wed May 29 09:27:31 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 29 May 2024 09:27:31 GMT Subject: RFR: 8320999: RISC-V: C2 RotateLeftV [v5] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > More detailed description is inline in the code. > Thanks Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: misc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19325/files - new: https://git.openjdk.org/jdk/pull/19325/files/474e0720..7340c81a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19325&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19325&range=03-04 Stats: 8 lines in 1 file changed: 0 ins; 6 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19325.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19325/head:pull/19325 PR: https://git.openjdk.org/jdk/pull/19325 From mli at openjdk.org Wed May 29 09:27:32 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 29 May 2024 09:27:32 GMT Subject: RFR: 8320999: RISC-V: C2 RotateLeftV [v4] In-Reply-To: <0vYGBwpVIOa8vxa5UtU7OYUd2yEB7jG1uUZvYTB-_40=.7745648b-1e74-40c0-9815-c809a79b8b9d@github.com> References: <0vYGBwpVIOa8vxa5UtU7OYUd2yEB7jG1uUZvYTB-_40=.7745648b-1e74-40c0-9815-c809a79b8b9d@github.com> Message-ID: On Wed, 29 May 2024 07:40:11 GMT, Fei Yang wrote: > Two minor comments remain. Otherwise looks good to me. BTW: You didn't mention the testing performed. Are these newly-added instructs properly test covered? Thanks. Yes, I've checked the instructs are matched and invoked during tests running. > src/hotspot/cpu/riscv/riscv_v.ad line 3082: > >> 3080: // >> 3081: // NOTE: for Long, its valid rotation value is 6 bits, although basic vector instruction only support 5 bit vector-immediate, >> 3082: // in Zvbb, vror.vi support 6 bits vector-immediate, so the imm implementation of Long and other types can be unified. > > Maybe simply: `As vror.vi encodes 6-bits immediate rotate amount, which is different from other vector-immediate instructions, implementation of vector rotation for long and other types can be unified.` modified > src/hotspot/cpu/riscv/riscv_v.ad line 3130: > >> 3128: instruct vrotate_right_masked(vReg dst_src, vReg shift, vRegMask_V0 v0) %{ >> 3129: match(Set dst_src (RotateRightV (Binary dst_src shift) v0)); >> 3130: effect(TEMP_DEF dst_src); > > Is the `TEMP_DEF dst_src` needed for these newly-added masked versions? Thanks for catching, removed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19325#issuecomment-2136952218 PR Review Comment: https://git.openjdk.org/jdk/pull/19325#discussion_r1618547353 PR Review Comment: https://git.openjdk.org/jdk/pull/19325#discussion_r1618546816 From aph at openjdk.org Wed May 29 09:32:41 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 29 May 2024 09:32:41 GMT Subject: RFR: 8331658: secondary_super_cache does not scale well: C1 [v2] In-Reply-To: References: Message-ID: <0rowz1jcBwDwG5peFhEj6CFKFUiPZcCgV3MGAEKH55Q=.35e88694-e170-4bf2-8cd7-1f309d0ab156@github.com> > This is the C1 version of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). > > The new logic in this PR is as simple as I can make it. It is a somewhat-simplified version of the C2 change in [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). In order to reduce risk I haven't touched the existing slow subtype stub. > The register allocation logic in the existing code is pretty gnarly, and I have no desire to break anything at this point in the release cycle, so I have allocated just one register more than the existing code does. > > Performance is pretty good. Before and after: > > x64, AMD 2950X, 8 cores: > > > Benchmark Mode Cnt Score Error Units > SecondarySuperCacheHits.test avgt 5 0.959 ? 0.091 ns/op > SecondarySuperCacheInterContention.test avgt 5 42.931 ? 6.951 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 5 42.397 ? 7.708 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 5 43.466 ? 8.238 ns/op > SecondarySuperCacheIntraContention.test avgt 5 74.660 ? 0.127 ns/op > > SecondarySuperCacheHits.test avgt 5 1.480 ? 0.077 ns/op > SecondarySuperCacheInterContention.test avgt 5 1.461 ? 0.063 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 5 1.767 ? 0.078 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 5 1.155 ? 0.052 ns/op > SecondarySuperCacheIntraContention.test avgt 5 1.421 ? 0.002 ns/op > > AArch64, Mac M3, 8 cores: > > > Benchmark Mode Cnt Score Error Units > SecondarySuperCacheHits.test avgt 5 0.835 ? 0.021 ns/op > SecondarySuperCacheInterContention.test avgt 5 74.078 ? 18.095 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 5 81.863 ? 42.492 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 5 66.293 ? 11.254 ns/op > SecondarySuperCacheIntraContention.test avgt 5 335.563 ? 6.171 ns/op > > SecondarySuperCacheHits.test avgt 5 1.212 ? 0.004 ns/op > SecondarySuperCacheInterContention.test avgt 5 0.871 ? 0.002 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 5 0.626 ? 0.003 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 5 1.115 ? 0.006 ns/op > SecondarySuperCacheIntraContention.test avgt 5 0.696 ? 0.001 ns/op > > > > The first test, `SecondarySuperCacheHits`, showns a small regression. It's the "happy path" which simply checks the same subclass again and again in a loop, i... Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: JDK-8331658: secondary_super_cache does not scale well: C1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19426/files - new: https://git.openjdk.org/jdk/pull/19426/files/8c05732c..7e948eb6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19426&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19426&range=00-01 Stats: 13 lines in 4 files changed: 4 ins; 4 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/19426.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19426/head:pull/19426 PR: https://git.openjdk.org/jdk/pull/19426 From mbaesken at openjdk.org Wed May 29 09:37:07 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 29 May 2024 09:37:07 GMT Subject: RFR: 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer In-Reply-To: References: Message-ID: On Wed, 29 May 2024 09:19:34 GMT, Martin Doerr wrote: > val needs an unsigned type to avoid undefined behavior because of signed integer overflow. I'd use uintptr_t. Makes sense to use something unsigned. Any good place(s) where to put those templates? For now I would just simply put them into relocInfo.hpp (we can used them if we need to reuse them somewhere else) . ------------- PR Comment: https://git.openjdk.org/jdk/pull/19424#issuecomment-2136973805 From aph at openjdk.org Wed May 29 09:47:10 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 29 May 2024 09:47:10 GMT Subject: RFR: 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" [v3] In-Reply-To: References: <16J-lJ2AceGTVcRWBcP15yKcwO-1IA1XsngyOuNjf7k=.0776f081-ae2c-4279-87cf-d909806c2bc4@github.com> Message-ID: On Wed, 29 May 2024 08:46:51 GMT, Fei Gao wrote: >> On LP64 systems, if the heap can be moved into low virtual address space (below 4GB) and the heap size is smaller than the interesting threshold of 4 GB, we can use unscaled decoding pattern for narrow klass decoding. It means that a generic field reference can be decoded by: >> >> cast<64> (32-bit compressed reference) + field_offset >> >> >> When the `field_offset` is an immediate, on aarch64 platform, the unscaled decoding pattern can match perfectly with a direct addressing mode, i.e., `base_plus_offset`, supported by `LDR/STR` instructions. But for certain data width, not all immediates can be encoded in the instruction field of `LDR/STR` [[1]](https://github.com/openjdk/jdk/blob/8db7bad992a0f31de9c7e00c2657c18670539102/src/hotspot/cpu/aarch64/assembler_aarch64.inline.hpp#L33). The ranges are different as data widths vary. >> >> For example, when we try to load a value of long type at offset of `1030`, the address expression is `(AddP (DecodeN base) 1030)`. Before the patch, the expression was matching with `operand indOffIN()`. But, for 64-bit `LDR/STR`, signed immediate byte offset must be in the range -256 to 255 or positive immediate byte offset must be a multiple of 8 in the range 0 to 32760 [[2]](https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/LDR--immediate---Load-Register--immediate--?lang=en). `1030` can't be encoded in the instruction field. So, after matching, when we do checking for instruction encoding, the assertion would fail. >> >> In this patch, we're going to filter out invalid immediates when deciding if current addressing mode can be matched as `base_plus_offset`. We introduce `indOffIN4/indOffLN4` and `indOffIN8/indOffLN8` for 32-bit data type and 64-bit data type separately in the patch. E.g., for `memory4`, we remove the generic `indOffIN/indOffLN`, which matches wrong unscaled immediate range, and replace them with `indOffIN4/indOffLN4` instead. >> >> Since 8-bit and 16-bit `LDR/STR` instructions also support the unscaled decoding pattern, we add the addressing mode in the lists of `memory1` and `memory2` by introducing `indOffIN1/indOffLN1` and `indOffIN2/indOffLN2`. >> >> We also remove unused operands `indOffI/indOffl/indOffIN/indOffLN` to avoid misuse. >> >> Tier 1-3 passed on aarch64. > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Add the assertion back and merge matchrules with a better predicate > - Merge branch 'master' into fg8319690 > - Remove unused immIOffset/immLOffset > - Merge branch 'master' into fg8319690 > - 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" > > On LP64 systems, if the heap can be moved into low virtual > address space (below 4GB) and the heap size is smaller than the > interesting threshold of 4 GB, we can use unscaled decoding > pattern for narrow klass decoding. It means that a generic field > reference can be decoded by: > ``` > cast<64> (32-bit compressed reference) + field_offset > ``` > > When the `field_offset` is an immediate, on aarch64 platform, the > unscaled decoding pattern can match perfectly with a direct > addressing mode, i.e., `base_plus_offset`, supported by LDR/STR > instructions. But for certain data width, not all immediates can > be encoded in the instruction field of LDR/STR[1]. The ranges are > different as data widths vary. > > For example, when we try to load a value of long type at offset of > `1030`, the address expression is `(AddP (DecodeN base) 1030)`. > Before the patch, the expression was matching with > `operand indOffIN()`. But, for 64-bit LDR/STR, signed immediate > byte offset must be in the range -256 to 255 or positive immediate > byte offset must be a multiple of 8 in the range 0 to 32760[2]. > `1030` can't be encoded in the instruction field. So, after > matching, when we do checking for instruction encoding, the > assertion would fail. > > In this patch, we're going to filter out invalid immediates > when deciding if current addressing mode can be matched as > `base_plus_offset`. We introduce `indOffIN4/indOffLN4` and > `indOffIN8/indOffLN8` for 32-bit data type and 64-bit data > type separately in the patch. E.g., for `memory4`, we remove > the generic `indOffIN/indOffLN`, which matches wrong unscaled > immediate range, and replace them with `indOffIN4/indOffLN4` > instead. > > Since 8-bit and 16-bit LDR/STR instructions also support the > unscaled decoding pattern, we add the addressing mode in the > lists of `memory1` and `memory2` by introducing > `indOffIN1/indOffLN1` and `indOffIN2/... src/hotspot/cpu/aarch64/aarch64.ad line 2726: > 2724: // so we assert here. > 2725: assert(Address::offset_ok_for_immed(addr.offset(), exact_log2(size_in_memory)), > 2726: "c2 compiler bug"); Please don't re-introduce this assertion. It was a mistake. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16991#discussion_r1618582009 From aph at openjdk.org Wed May 29 09:54:12 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 29 May 2024 09:54:12 GMT Subject: RFR: 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" [v3] In-Reply-To: References: <16J-lJ2AceGTVcRWBcP15yKcwO-1IA1XsngyOuNjf7k=.0776f081-ae2c-4279-87cf-d909806c2bc4@github.com> Message-ID: On Wed, 29 May 2024 08:46:51 GMT, Fei Gao wrote: >> On LP64 systems, if the heap can be moved into low virtual address space (below 4GB) and the heap size is smaller than the interesting threshold of 4 GB, we can use unscaled decoding pattern for narrow klass decoding. It means that a generic field reference can be decoded by: >> >> cast<64> (32-bit compressed reference) + field_offset >> >> >> When the `field_offset` is an immediate, on aarch64 platform, the unscaled decoding pattern can match perfectly with a direct addressing mode, i.e., `base_plus_offset`, supported by `LDR/STR` instructions. But for certain data width, not all immediates can be encoded in the instruction field of `LDR/STR` [[1]](https://github.com/openjdk/jdk/blob/8db7bad992a0f31de9c7e00c2657c18670539102/src/hotspot/cpu/aarch64/assembler_aarch64.inline.hpp#L33). The ranges are different as data widths vary. >> >> For example, when we try to load a value of long type at offset of `1030`, the address expression is `(AddP (DecodeN base) 1030)`. Before the patch, the expression was matching with `operand indOffIN()`. But, for 64-bit `LDR/STR`, signed immediate byte offset must be in the range -256 to 255 or positive immediate byte offset must be a multiple of 8 in the range 0 to 32760 [[2]](https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/LDR--immediate---Load-Register--immediate--?lang=en). `1030` can't be encoded in the instruction field. So, after matching, when we do checking for instruction encoding, the assertion would fail. >> >> In this patch, we're going to filter out invalid immediates when deciding if current addressing mode can be matched as `base_plus_offset`. We introduce `indOffIN4/indOffLN4` and `indOffIN8/indOffLN8` for 32-bit data type and 64-bit data type separately in the patch. E.g., for `memory4`, we remove the generic `indOffIN/indOffLN`, which matches wrong unscaled immediate range, and replace them with `indOffIN4/indOffLN4` instead. >> >> Since 8-bit and 16-bit `LDR/STR` instructions also support the unscaled decoding pattern, we add the addressing mode in the lists of `memory1` and `memory2` by introducing `indOffIN1/indOffLN1` and `indOffIN2/indOffLN2`. >> >> We also remove unused operands `indOffI/indOffl/indOffIN/indOffLN` to avoid misuse. >> >> Tier 1-3 passed on aarch64. > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Add the assertion back and merge matchrules with a better predicate > - Merge branch 'master' into fg8319690 > - Remove unused immIOffset/immLOffset > - Merge branch 'master' into fg8319690 > - 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" > > On LP64 systems, if the heap can be moved into low virtual > address space (below 4GB) and the heap size is smaller than the > interesting threshold of 4 GB, we can use unscaled decoding > pattern for narrow klass decoding. It means that a generic field > reference can be decoded by: > ``` > cast<64> (32-bit compressed reference) + field_offset > ``` > > When the `field_offset` is an immediate, on aarch64 platform, the > unscaled decoding pattern can match perfectly with a direct > addressing mode, i.e., `base_plus_offset`, supported by LDR/STR > instructions. But for certain data width, not all immediates can > be encoded in the instruction field of LDR/STR[1]. The ranges are > different as data widths vary. > > For example, when we try to load a value of long type at offset of > `1030`, the address expression is `(AddP (DecodeN base) 1030)`. > Before the patch, the expression was matching with > `operand indOffIN()`. But, for 64-bit LDR/STR, signed immediate > byte offset must be in the range -256 to 255 or positive immediate > byte offset must be a multiple of 8 in the range 0 to 32760[2]. > `1030` can't be encoded in the instruction field. So, after > matching, when we do checking for instruction encoding, the > assertion would fail. > > In this patch, we're going to filter out invalid immediates > when deciding if current addressing mode can be matched as > `base_plus_offset`. We introduce `indOffIN4/indOffLN4` and > `indOffIN8/indOffLN8` for 32-bit data type and 64-bit data > type separately in the patch. E.g., for `memory4`, we remove > the generic `indOffIN/indOffLN`, which matches wrong unscaled > immediate range, and replace them with `indOffIN4/indOffLN4` > instead. > > Since 8-bit and 16-bit LDR/STR instructions also support the > unscaled decoding pattern, we add the addressing mode in the > lists of `memory1` and `memory2` by introducing > `indOffIN1/indOffLN1` and `indOffIN2/... This is much better. However, I don't think that all the IndOffXX types do us any good. It would be simpler and faster to match a general-purpose IndOff type then let `legitimize_address()` fix any out-of-range operands. That'd reduce the size of the match rules and the time it takes to run them. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16991#issuecomment-2137007650 From mbaesken at openjdk.org Wed May 29 10:04:14 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 29 May 2024 10:04:14 GMT Subject: RFR: 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer [v2] In-Reply-To: References: Message-ID: > When running on macOS with ubsan enabled, we see some issues in relocInfo (hpp and cpp); those already occur in the build quite early. > > /jdk/src/hotspot/share/code/relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer > > Similar happens when we add to the _current pointer > _current++; > this gives : > relocInfo.hpp:606:13: runtime error: applying non-zero offset to non-null pointer 0xfffffffffffffffe produced null pointer > > Seems the pointer subtraction/addition worked so far, so it might be an option to disable ubsan for those 2 functions. Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: use template functions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19424/files - new: https://git.openjdk.org/jdk/pull/19424/files/ea8ecba9..bbb0c96f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19424&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19424&range=00-01 Stats: 19 lines in 2 files changed: 11 ins; 6 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19424.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19424/head:pull/19424 PR: https://git.openjdk.org/jdk/pull/19424 From prappo at openjdk.org Wed May 29 10:28:06 2024 From: prappo at openjdk.org (Pavel Rappo) Date: Wed, 29 May 2024 10:28:06 GMT Subject: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier [v2] In-Reply-To: References: <1tx_tl3PV2W5NCEXXawQY5V2ndnSOHPfjisypuhKdhA=.79840096-bac0-4da4-8102-c7ecea7cb5f0@github.com> Message-ID: <5hNYOD-Vz4Pi9bww3h1BCCvSwHEs_D8LcRqafU8wqtw=.977f0923-b9c9-4490-bf37-fa393225e54e@github.com> On Wed, 29 May 2024 03:21:27 GMT, Chen Liang wrote: >> Pavel Rappo has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix incorrect utf16 hashCode adaptation > > src/java.base/share/classes/jdk/internal/util/ArraysSupport.java line 320: > >> 318: * @return the calculated hash value >> 319: */ >> 320: public static int hashCode(Object[] a, int fromIndex, int length, int initialValue) { > > Is the object variant necessary here? The object version is hard for JIT to profile as it's quite polymorphic compared to other arrays, and the initial value is always 1. This is a cleanup/refactoring PR, so none of this is necessary. My motivation for the `Object[]` variant was to provide reusable functionality for methods like these: - https://github.com/openjdk/jdk/blob/0ef03f122866f010ebf50683097e9b92e41cdaad/src/java.base/share/classes/java/util/concurrent/CopyOnWriteArrayList.java#L1076-L1083 - https://github.com/openjdk/jdk/blob/0ef03f122866f010ebf50683097e9b92e41cdaad/src/java.base/share/classes/java/util/ArrayList.java#L669-L680 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19414#discussion_r1618644177 From fyang at openjdk.org Wed May 29 11:03:03 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 29 May 2024 11:03:03 GMT Subject: RFR: 8320999: RISC-V: C2 RotateLeftV [v5] In-Reply-To: References: Message-ID: On Wed, 29 May 2024 09:27:31 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> More detailed description is inline in the code. >> Thanks > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > misc Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19325#pullrequestreview-2085075207 From chagedorn at openjdk.org Wed May 29 11:12:30 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 29 May 2024 11:12:30 GMT Subject: RFR: 8332032: C2: Remove ExpandSubTypeCheckAtParseTime flag Message-ID: With this patch I propose to remove the diagnostic product flag `ExpandSubTypeCheckAtParseTime` for the following reasons: - Expanding sub type checks eagerly during parse time has a maintenance cost. We've had to make special fixes due to skipping `SubTypeCheckNodes` in the past (recent example: [JDK-8328702](https://bugs.openjdk.org/browse/JDK-8328702), where the idea of removing this flag was first discussed). - This stress option has not helped much to find bugs. Going through JBS, maybe only 1 or 2 bugs can be attributed to this flag over the last 4 years - and even for those, it could have very well be that the flag was not required because it was often accompanied by other stress flags such as `StressReflecitiveCode`. - We currently have a bug in Valhalla ([JDK-8331912](https://bugs.openjdk.org/browse/JDK-8331912)) which only happens with `ExpandSubTYpeCheckAtParseTime`. The reason is that we lose flatness information due to the eager sub type expansion. Later, data becomes top and the corresponding (already expanded) sub type check fails to fold control as well, leading to a broken graph. The simplest solution is to remove `ExpandSubTYpeCheckAtParseTime`. Thanks, Christian ------------- Commit messages: - 8332032: C2: Remove ExpandSubTypeCheckAtParseTime flag Changes: https://git.openjdk.org/jdk/pull/19430/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19430&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332032 Stats: 18 lines in 4 files changed: 1 ins; 13 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19430.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19430/head:pull/19430 PR: https://git.openjdk.org/jdk/pull/19430 From dfenacci at openjdk.org Wed May 29 11:21:19 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 29 May 2024 11:21:19 GMT Subject: RFR: 8333099: Missing check for is_LoadVector in StoreNode::Identity [v2] In-Reply-To: References: Message-ID: On Wed, 29 May 2024 09:10:52 GMT, Christian Hagedorn wrote: > Looks good! Can you also add a regression test? Yep, added. Thanks @chhagedorn! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19449#issuecomment-2137156480 From chagedorn at openjdk.org Wed May 29 11:21:18 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 29 May 2024 11:21:18 GMT Subject: RFR: 8333099: Missing check for is_LoadVector in StoreNode::Identity [v2] In-Reply-To: References: Message-ID: On Wed, 29 May 2024 11:18:49 GMT, Damon Fenacci wrote: >> [JDK-8325520](https://bugs.openjdk.org/browse/JDK-8325520) introduced a check for type equality in `StoreNode::Identity` in the specific case of a load vector followed by a store vector. >> Unfortunately the memory node operand might actually not be of type `LoadVector`. So, before retrieving its type, a check for `is_LoadVector` is necessary. > > Damon Fenacci has updated the pull request incrementally with three additional commits since the last revision: > > - Update test/hotspot/jtreg/compiler/vectorapi/TestIsLoadVector.java > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/vectorapi/TestIsLoadVector.java > > Co-authored-by: Christian Hagedorn > - JDK-8333099: add regression test Thanks for adding a test! test/hotspot/jtreg/compiler/vectorapi/TestIsLoadVector.java line 33: > 31: */ > 32: > 33: public class TestIsLoadVector { Suggestion: public class TestIsLoadVector { test/hotspot/jtreg/compiler/vectorapi/TestIsLoadVector.java line 61: > 59: for (int i = 0; i < 100; i++) { > 60: test(); > 61: } Suggestion: for (int i = 0; i < 100; i++) { test(); } ------------- PR Review: https://git.openjdk.org/jdk/pull/19449#pullrequestreview-2085102706 PR Review Comment: https://git.openjdk.org/jdk/pull/19449#discussion_r1618701854 PR Review Comment: https://git.openjdk.org/jdk/pull/19449#discussion_r1618702425 From dfenacci at openjdk.org Wed May 29 11:21:18 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 29 May 2024 11:21:18 GMT Subject: RFR: 8333099: Missing check for is_LoadVector in StoreNode::Identity [v2] In-Reply-To: References: Message-ID: > [JDK-8325520](https://bugs.openjdk.org/browse/JDK-8325520) introduced a check for type equality in `StoreNode::Identity` in the specific case of a load vector followed by a store vector. > Unfortunately the memory node operand might actually not be of type `LoadVector`. So, before retrieving its type, a check for `is_LoadVector` is necessary. Damon Fenacci has updated the pull request incrementally with three additional commits since the last revision: - Update test/hotspot/jtreg/compiler/vectorapi/TestIsLoadVector.java Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/vectorapi/TestIsLoadVector.java Co-authored-by: Christian Hagedorn - JDK-8333099: add regression test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19449/files - new: https://git.openjdk.org/jdk/pull/19449/files/9fb317a3..f4150d37 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19449&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19449&range=00-01 Stats: 63 lines in 1 file changed: 63 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19449.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19449/head:pull/19449 PR: https://git.openjdk.org/jdk/pull/19449 From thartmann at openjdk.org Wed May 29 11:26:00 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 29 May 2024 11:26:00 GMT Subject: RFR: 8332032: C2: Remove ExpandSubTypeCheckAtParseTime flag In-Reply-To: References: Message-ID: On Tue, 28 May 2024 15:01:14 GMT, Christian Hagedorn wrote: > With this patch I propose to remove the diagnostic product flag `ExpandSubTypeCheckAtParseTime` for the following reasons: > - Expanding sub type checks eagerly during parse time has a maintenance cost. We've had to make special fixes due to skipping `SubTypeCheckNodes` in the past (recent example: [JDK-8328702](https://bugs.openjdk.org/browse/JDK-8328702), where the idea of removing this flag was first discussed). > - This stress option has not helped much to find bugs. Going through JBS, maybe only 1 or 2 bugs can be attributed to this flag over the last 4 years - and even for those, it could have very well be that the flag was not required because it was often accompanied by other stress flags such as `StressReflecitiveCode`. > - We currently have a bug in Valhalla ([JDK-8331912](https://bugs.openjdk.org/browse/JDK-8331912)) which only happens with `ExpandSubTYpeCheckAtParseTime`. The reason is that we lose flatness information due to the eager sub type expansion. Later, data becomes top and the corresponding (already expanded) sub type check fails to fold control as well, leading to a broken graph. The simplest solution is to remove `ExpandSubTYpeCheckAtParseTime`. > > Thanks, > Christian Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19430#pullrequestreview-2085116889 From chagedorn at openjdk.org Wed May 29 11:26:00 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 29 May 2024 11:26:00 GMT Subject: RFR: 8332032: C2: Remove ExpandSubTypeCheckAtParseTime flag In-Reply-To: References: Message-ID: On Tue, 28 May 2024 15:01:14 GMT, Christian Hagedorn wrote: > With this patch I propose to remove the diagnostic product flag `ExpandSubTypeCheckAtParseTime` for the following reasons: > - Expanding sub type checks eagerly during parse time has a maintenance cost. We've had to make special fixes due to skipping `SubTypeCheckNodes` in the past (recent example: [JDK-8328702](https://bugs.openjdk.org/browse/JDK-8328702), where the idea of removing this flag was first discussed). > - This stress option has not helped much to find bugs. Going through JBS, maybe only 1 or 2 bugs can be attributed to this flag over the last 4 years - and even for those, it could have very well be that the flag was not required because it was often accompanied by other stress flags such as `StressReflecitiveCode`. > - We currently have a bug in Valhalla ([JDK-8331912](https://bugs.openjdk.org/browse/JDK-8331912)) which only happens with `ExpandSubTYpeCheckAtParseTime`. The reason is that we lose flatness information due to the eager sub type expansion. Later, data becomes top and the corresponding (already expanded) sub type check fails to fold control as well, leading to a broken graph. The simplest solution is to remove `ExpandSubTYpeCheckAtParseTime`. > > Thanks, > Christian Thanks Tobias for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19430#issuecomment-2137176211 From dfenacci at openjdk.org Wed May 29 11:29:35 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 29 May 2024 11:29:35 GMT Subject: RFR: 8333099: Missing check for is_LoadVector in StoreNode::Identity [v3] In-Reply-To: References: Message-ID: > [JDK-8325520](https://bugs.openjdk.org/browse/JDK-8325520) introduced a check for type equality in `StoreNode::Identity` in the specific case of a load vector followed by a store vector. > Unfortunately the memory node operand might actually not be of type `LoadVector`. So, before retrieving its type, a check for `is_LoadVector` is necessary. Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8333099: fix indentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19449/files - new: https://git.openjdk.org/jdk/pull/19449/files/f4150d37..fd7aaac6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19449&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19449&range=01-02 Stats: 27 lines in 1 file changed: 0 ins; 0 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/19449.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19449/head:pull/19449 PR: https://git.openjdk.org/jdk/pull/19449 From chagedorn at openjdk.org Wed May 29 11:29:35 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 29 May 2024 11:29:35 GMT Subject: RFR: 8333099: Missing check for is_LoadVector in StoreNode::Identity [v3] In-Reply-To: References: Message-ID: On Wed, 29 May 2024 11:25:55 GMT, Damon Fenacci wrote: >> [JDK-8325520](https://bugs.openjdk.org/browse/JDK-8325520) introduced a check for type equality in `StoreNode::Identity` in the specific case of a load vector followed by a store vector. >> Unfortunately the memory node operand might actually not be of type `LoadVector`. So, before retrieving its type, a check for `is_LoadVector` is necessary. > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8333099: fix indentation Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19449#pullrequestreview-2085118887 From thartmann at openjdk.org Wed May 29 11:29:35 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 29 May 2024 11:29:35 GMT Subject: RFR: 8333099: Missing check for is_LoadVector in StoreNode::Identity [v3] In-Reply-To: References: Message-ID: On Wed, 29 May 2024 11:25:55 GMT, Damon Fenacci wrote: >> [JDK-8325520](https://bugs.openjdk.org/browse/JDK-8325520) introduced a check for type equality in `StoreNode::Identity` in the specific case of a load vector followed by a store vector. >> Unfortunately the memory node operand might actually not be of type `LoadVector`. So, before retrieving its type, a check for `is_LoadVector` is necessary. > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8333099: fix indentation Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19449#pullrequestreview-2085125372 From rcastanedalo at openjdk.org Wed May 29 11:37:03 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 29 May 2024 11:37:03 GMT Subject: RFR: 8332032: C2: Remove ExpandSubTypeCheckAtParseTime flag In-Reply-To: References: Message-ID: <5f4Loq_8DVYQPKFa3DPbStsmRLs9LhpblcgowzATOsc=.465eb065-4414-4071-b20a-c36afa97e88c@github.com> On Tue, 28 May 2024 15:01:14 GMT, Christian Hagedorn wrote: > With this patch I propose to remove the diagnostic product flag `ExpandSubTypeCheckAtParseTime` for the following reasons: > - Expanding sub type checks eagerly during parse time has a maintenance cost. We've had to make special fixes due to skipping `SubTypeCheckNodes` in the past (recent example: [JDK-8328702](https://bugs.openjdk.org/browse/JDK-8328702), where the idea of removing this flag was first discussed). > - This stress option has not helped much to find bugs. Going through JBS, maybe only 1 or 2 bugs can be attributed to this flag over the last 4 years - and even for those, it could have very well be that the flag was not required because it was often accompanied by other stress flags such as `StressReflecitiveCode`. > - We currently have a bug in Valhalla ([JDK-8331912](https://bugs.openjdk.org/browse/JDK-8331912)) which only happens with `ExpandSubTYpeCheckAtParseTime`. The reason is that we lose flatness information due to the eager sub type expansion. Later, data becomes top and the corresponding (already expanded) sub type check fails to fold control as well, leading to a broken graph. The simplest solution is to remove `ExpandSubTYpeCheckAtParseTime`. > > Thanks, > Christian Looks good! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19430#pullrequestreview-2085145243 From dfenacci at openjdk.org Wed May 29 11:39:05 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 29 May 2024 11:39:05 GMT Subject: RFR: 8333099: Missing check for is_LoadVector in StoreNode::Identity [v3] In-Reply-To: References: Message-ID: On Wed, 29 May 2024 11:22:24 GMT, Christian Hagedorn wrote: >> Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8333099: fix indentation > > Marked as reviewed by chagedorn (Reviewer). Thanks for reviewing @chhagedorn @TobiHartmann. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19449#issuecomment-2137198505 From liach at openjdk.org Wed May 29 11:45:07 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 29 May 2024 11:45:07 GMT Subject: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier [v2] In-Reply-To: <1tx_tl3PV2W5NCEXXawQY5V2ndnSOHPfjisypuhKdhA=.79840096-bac0-4da4-8102-c7ecea7cb5f0@github.com> References: <1tx_tl3PV2W5NCEXXawQY5V2ndnSOHPfjisypuhKdhA=.79840096-bac0-4da4-8102-c7ecea7cb5f0@github.com> Message-ID: On Mon, 27 May 2024 20:55:29 GMT, Pavel Rappo wrote: >> Please review this PR, which supersedes a now withdrawn https://github.com/openjdk/jdk/pull/14831. >> >> This PR replaces `ArraysSupport.vectorizedHashCode` with a set of more user-friendly methods. Here's a summary: >> >> - Made the operand constants (i.e. `T_BOOLEAN` and friends) and the `vectorizedHashCode` method private >> >> - Made the `vectorizedHashCode` method private, but didn't rename it. Renaming would dramatically increase this PR review cost, because that method's name is used by a lot of VM code. On a bright side, since the method is now private, it's no longer callable by clients of `ArraysSupport`, thus a problem of an inaccurate name is less severe. >> >> - Made the `ArraysSupport.utf16HashCode` method private >> >> - Moved tiny cases (i.e. 0, 1, 2) to `ArraysSupport` > > Pavel Rappo has updated the pull request incrementally with one additional commit since the last revision: > > Fix incorrect utf16 hashCode adaptation Marked as reviewed by liach (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/19414#pullrequestreview-2085162416 From chagedorn at openjdk.org Wed May 29 11:52:03 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 29 May 2024 11:52:03 GMT Subject: RFR: 8332032: C2: Remove ExpandSubTypeCheckAtParseTime flag In-Reply-To: References: Message-ID: <2WxO4p89q175oZB700HTP9bJhJOqjze1OzPr-fV_DcA=.526dac84-fbcb-4bca-b878-7bc22b474320@github.com> On Tue, 28 May 2024 15:01:14 GMT, Christian Hagedorn wrote: > With this patch I propose to remove the diagnostic product flag `ExpandSubTypeCheckAtParseTime` for the following reasons: > - Expanding sub type checks eagerly during parse time has a maintenance cost. We've had to make special fixes due to skipping `SubTypeCheckNodes` in the past (recent example: [JDK-8328702](https://bugs.openjdk.org/browse/JDK-8328702), where the idea of removing this flag was first discussed). > - This stress option has not helped much to find bugs. Going through JBS, maybe only 1 or 2 bugs can be attributed to this flag over the last 4 years - and even for those, it could have very well be that the flag was not required because it was often accompanied by other stress flags such as `StressReflecitiveCode`. > - We currently have a bug in Valhalla ([JDK-8331912](https://bugs.openjdk.org/browse/JDK-8331912)) which only happens with `ExpandSubTYpeCheckAtParseTime`. The reason is that we lose flatness information due to the eager sub type expansion. Later, data becomes top and the corresponding (already expanded) sub type check fails to fold control as well, leading to a broken graph. The simplest solution is to remove `ExpandSubTYpeCheckAtParseTime`. > > Thanks, > Christian Thanks Roberto for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19430#issuecomment-2137219762 From mli at openjdk.org Wed May 29 12:28:06 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 29 May 2024 12:28:06 GMT Subject: RFR: 8320999: RISC-V: C2 RotateLeftV [v4] In-Reply-To: References: Message-ID: On Wed, 29 May 2024 03:27:15 GMT, Ludovic Henry wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> add reg version > > Marked as reviewed by luhenry (Committer). Thanks @luhenry @RealFYang for your reviewing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19325#issuecomment-2137285161 From mli at openjdk.org Wed May 29 12:28:07 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 29 May 2024 12:28:07 GMT Subject: Integrated: 8320999: RISC-V: C2 RotateLeftV In-Reply-To: References: Message-ID: On Tue, 21 May 2024 11:51:00 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > More detailed description is inline in the code. > Thanks This pull request has now been integrated. Changeset: fed2b560 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/fed2b56017ae454082d320513b77518e624fb03c Stats: 251 lines in 4 files changed: 243 ins; 3 del; 5 mod 8320999: RISC-V: C2 RotateLeftV 8321000: RISC-V: C2 RotateRightV Reviewed-by: luhenry, fyang ------------- PR: https://git.openjdk.org/jdk/pull/19325 From prappo at openjdk.org Wed May 29 12:47:02 2024 From: prappo at openjdk.org (Pavel Rappo) Date: Wed, 29 May 2024 12:47:02 GMT Subject: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier [v2] In-Reply-To: References: <1tx_tl3PV2W5NCEXXawQY5V2ndnSOHPfjisypuhKdhA=.79840096-bac0-4da4-8102-c7ecea7cb5f0@github.com> Message-ID: On Tue, 28 May 2024 20:40:30 GMT, Claes Redestad wrote: >> src/java.base/share/classes/jdk/internal/util/ArraysSupport.java line 275: >> >>> 273: return switch (length) { >>> 274: case 0 -> initialValue; >>> 275: case 1 -> 31 * initialValue + (a[fromIndex] & 0xff); >> >> For clarity, if you think it helps: >> Suggestion: >> >> case 1 -> 31 * initialValue + Byte.toUnsignedInt(a[fromIndex]); > > I don't care as long as microbenchmarks don't get a hiccup. @cl4es, here are some results from my machine (macosx-aarch64): Name (size) Cnt Base Error Test Error Unit Change ArraysHashCode.bytes 1 15 0.715 ? 0.004 0.725 ? 0.029 ns/op 0.99x (p = 0.182 ) ArraysHashCode.bytes 10 15 3.753 ? 0.024 3.747 ? 0.011 ns/op 1.00x (p = 0.322 ) ArraysHashCode.bytes 100 15 69.731 ? 0.157 69.737 ? 0.092 ns/op 1.00x (p = 0.891 ) ArraysHashCode.bytes 10000 15 9369.386 ? 1.449 9372.008 ? 6.678 ns/op 1.00x (p = 0.133 ) ArraysHashCode.chars 1 15 0.719 ? 0.024 0.734 ? 0.024 ns/op 0.98x (p = 0.076 ) ArraysHashCode.chars 10 15 3.744 ? 0.005 3.746 ? 0.004 ns/op 1.00x (p = 0.308 ) ArraysHashCode.chars 100 15 69.741 ? 0.112 69.714 ? 0.044 ns/op 1.00x (p = 0.365 ) ArraysHashCode.chars 10000 15 9367.123 ? 5.320 9371.325 ? 6.407 ns/op 1.00x (p = 0.046 ) ArraysHashCode.ints 1 15 0.711 ? 0.013 0.706 ? 0.006 ns/op 1.01x (p = 0.137 ) ArraysHashCode.ints 10 15 3.750 ? 0.002 3.752 ? 0.004 ns/op 1.00x (p = 0.283 ) ArraysHashCode.ints 100 15 69.753 ? 0.086 69.711 ? 0.016 ns/op 1.00x (p = 0.065 ) ArraysHashCode.ints 10000 15 9376.225 ? 5.845 9376.218 ? 12.181 ns/op 1.00x (p = 0.999 ) ArraysHashCode.multibytes 1 15 0.741 ? 0.001 0.740 ? 0.001 ns/op 1.00x (p = 0.038 ) ArraysHashCode.multibytes 10 15 2.737 ? 0.001 2.826 ? 0.136 ns/op 0.97x (p = 0.017 ) ArraysHashCode.multibytes 100 15 32.202 ? 0.059 32.153 ? 0.006 ns/op 1.00x (p = 0.004*) ArraysHashCode.multibytes 10000 15 4922.740 ? 25.590 4921.468 ? 7.372 ns/op 1.00x (p = 0.846 ) ArraysHashCode.multichars 1 15 0.740 ? 0.005 0.740 ? 0.000 ns/op 1.00x (p = 0.996 ) ArraysHashCode.multichars 10 15 2.732 ? 0.002 2.737 ? 0.003 ns/op 1.00x (p = 0.000*) ArraysHashCode.multichars 100 15 32.109 ? 0.017 32.182 ? 0.028 ns/op 1.00x (p = 0.000*) ArraysHashCode.multichars 10000 15 4925.750 ? 46.366 4930.684 ? 26.001 ns/op 1.00x (p = 0.704 ) ArraysHashCode.multiints 1 15 0.740 ? 0.000 0.739 ? 0.000 ns/op 1.00x (p = 0.000*) ArraysHashCode.multiints 10 15 2.919 ? 0.002 2.953 ? 0.059 ns/op 0.99x (p = 0.033 ) ArraysHashCode.multiints 100 15 32.140 ? 0.011 32.094 ? 0.004 ns/op 1.00x (p = 0.000*) ArraysHashCode.multiints 10000 15 4918.911 ? 3.512 4913.884 ? 11.618 ns/op 1.00x (p = 0.105 ) ArraysHashCode.multishorts 1 15 0.740 ? 0.001 0.739 ? 0.000 ns/op 1.00x (p = 0.000*) ArraysHashCode.multishorts 10 15 2.736 ? 0.002 2.733 ? 0.008 ns/op 1.00x (p = 0.159 ) ArraysHashCode.multishorts 100 15 32.162 ? 0.033 32.105 ? 0.008 ns/op 1.00x (p = 0.000*) ArraysHashCode.multishorts 10000 15 4916.984 ? 3.276 4912.000 ? 11.479 ns/op 1.00x (p = 0.103 ) ArraysHashCode.shorts 1 15 0.711 ? 0.023 0.709 ? 0.016 ns/op 1.00x (p = 0.818 ) ArraysHashCode.shorts 10 15 3.745 ? 0.003 3.739 ? 0.010 ns/op 1.00x (p = 0.049 ) ArraysHashCode.shorts 100 15 69.725 ? 0.082 69.620 ? 0.051 ns/op 1.00x (p = 0.000*) ArraysHashCode.shorts 10000 15 9370.882 ? 8.306 9356.215 ? 3.996 ns/op 1.00x (p = 0.000*) * = significant ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19414#discussion_r1618821363 From prappo at openjdk.org Wed May 29 12:56:02 2024 From: prappo at openjdk.org (Pavel Rappo) Date: Wed, 29 May 2024 12:56:02 GMT Subject: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier [v2] In-Reply-To: References: <1tx_tl3PV2W5NCEXXawQY5V2ndnSOHPfjisypuhKdhA=.79840096-bac0-4da4-8102-c7ecea7cb5f0@github.com> Message-ID: On Tue, 28 May 2024 19:13:50 GMT, Jorn Vernee wrote: >> Pavel Rappo has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix incorrect utf16 hashCode adaptation > > src/java.base/share/classes/jdk/internal/util/ArraysSupport.java line 252: > >> 250: return switch (length) { >> 251: case 0 -> initialValue; >> 252: case 1 -> 31 * initialValue + (int) a[fromIndex]; > > Suggestion: > > case 1 -> 31 * initialValue + (int) a[fromIndex]; // sign extension To be honest, I don't think that this cast is needed. A better solution than to add a comment would be to delete all `(int)` casts from new `hashCode*` methods of `ArraysSupport`. Those `(int)` casts migrated from `hashCode` methods of `Arrays` where there were used if neither of two `+` operands were of type `int`. But in `ArraysSupport` it's no longer the case: `31 * initialValue` is always `int` because `initialValue` is. So, `a[fromIndex]` is promoted to `int` by the virtue of https://docs.oracle.com/javase/specs/jls/se22/html/jls-5.html#jls-5.6. For more confidence, consider that the `private static int hashCode` methods (implementation) of `ArraysSupport` do not have those casts. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19414#discussion_r1618835934 From epeter at openjdk.org Wed May 29 15:47:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 29 May 2024 15:47:27 GMT Subject: RFR: 8332905: C2 SuperWord: bad AD file, with RotateRightV and first operand not a pack Message-ID: I just discovered this bug by manual code inspection, and found a reproducer. It seems to be a regression of [JDK-8248830](https://bugs.openjdk.org/browse/JDK-8248830), that is when RotateRightV was added to SuperWord. The problem is that we directly get the input node, rather than the `vector_opd`, which fails if that input is not a vector already, but for example a `PopulateIndex` pattern that is only vectorized when calling `vector_opd`. Before this patch: it looks like this: } else if (VectorNode::is_scalar_rotate(n)) { Node* in1 = first->in(1); Node* in2 = first->in(2); But at least `in1` should be using `vector_opd`, like most other ops: ` Node* in1 = vector_opd(p, 1);` When the input is a `PopulateIndex` pattern, then `first->in(1) `gives us the iv-phi, which is a scalar. `vector_opd` would produce a `PopulateIndex` vector. In the ad-file, we get an error, because we do not expect a scalar as the first operand of the RotateRightV, but a vector. ------------- Commit messages: - fix populate_index, only on sve - 8332905 Changes: https://git.openjdk.org/jdk/pull/19445/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19445&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332905 Stats: 21 lines in 2 files changed: 18 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19445.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19445/head:pull/19445 PR: https://git.openjdk.org/jdk/pull/19445 From prappo at openjdk.org Wed May 29 15:50:32 2024 From: prappo at openjdk.org (Pavel Rappo) Date: Wed, 29 May 2024 15:50:32 GMT Subject: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier [v3] In-Reply-To: References: Message-ID: > Please review this PR, which supersedes a now withdrawn https://github.com/openjdk/jdk/pull/14831. > > This PR replaces `ArraysSupport.vectorizedHashCode` with a set of more user-friendly methods. Here's a summary: > > - Made the operand constants (i.e. `T_BOOLEAN` and friends) and the `vectorizedHashCode` method private > > - Made the `vectorizedHashCode` method private, but didn't rename it. Renaming would dramatically increase this PR review cost, because that method's name is used by a lot of VM code. On a bright side, since the method is now private, it's no longer callable by clients of `ArraysSupport`, thus a problem of an inaccurate name is less severe. > > - Made the `ArraysSupport.utf16HashCode` method private > > - Moved tiny cases (i.e. 0, 1, 2) to `ArraysSupport` Pavel Rappo has updated the pull request incrementally with three additional commits since the last revision: - Update copyright years Note: any commit hashes below might be outdated due to subsequent history rewriting (e.g. git rebase). + update src/java.base/share/classes/java/lang/CharacterName.java due to 4ed451d691c + update src/java.base/share/classes/java/lang/StringLatin1.java due to 4ed451d691c + update src/java.base/share/classes/java/nio/Heap-X-Buffer.java.template due to 4ed451d691c + update src/java.base/share/classes/jdk/internal/classfile/impl/AbstractPoolEntry.java due to 4ed451d691c + update src/java.base/share/classes/sun/security/util/DerValue.java due to 4ed451d691c + update src/java.base/unix/classes/sun/nio/fs/UnixPath.java due to 4ed451d691c + update test/hotspot/jtreg/compiler/intrinsics/TestArraysHashCode.java due to 4ed451d691c - Drop redundant (int) cast - Use Byte.toUnsignedInt not & 0xff ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19414/files - new: https://git.openjdk.org/jdk/pull/19414/files/adc7557d..53d4ed09 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19414&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19414&range=01-02 Stats: 12 lines in 8 files changed: 0 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/19414.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19414/head:pull/19414 PR: https://git.openjdk.org/jdk/pull/19414 From prappo at openjdk.org Wed May 29 15:54:02 2024 From: prappo at openjdk.org (Pavel Rappo) Date: Wed, 29 May 2024 15:54:02 GMT Subject: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier [v2] In-Reply-To: References: <1tx_tl3PV2W5NCEXXawQY5V2ndnSOHPfjisypuhKdhA=.79840096-bac0-4da4-8102-c7ecea7cb5f0@github.com> Message-ID: On Wed, 29 May 2024 12:53:42 GMT, Pavel Rappo wrote: >> src/java.base/share/classes/jdk/internal/util/ArraysSupport.java line 252: >> >>> 250: return switch (length) { >>> 251: case 0 -> initialValue; >>> 252: case 1 -> 31 * initialValue + (int) a[fromIndex]; >> >> Suggestion: >> >> case 1 -> 31 * initialValue + (int) a[fromIndex]; // sign extension > > To be honest, I don't think that this cast is needed. A better solution than to add a comment would be to delete all `(int)` casts from new `hashCode*` methods of `ArraysSupport`. > > Those `(int)` casts migrated from `hashCode` methods of `Arrays` where there were used if neither of two `+` operands were of type `int`. But in `ArraysSupport` it's no longer the case: `31 * initialValue` is always `int` because `initialValue` is. So, `a[fromIndex]` is promoted to `int` by the virtue of https://docs.oracle.com/javase/specs/jls/se22/html/jls-5.html#jls-5.6. > > For more confidence, consider that the `private static int hashCode` methods (implementation) of `ArraysSupport` do not have those casts. Removed casts in https://github.com/openjdk/jdk/pull/19414/commits/d8dabc68c4264520fd6c57c949049f2ab5c8e0ec. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19414#discussion_r1619127224 From prappo at openjdk.org Wed May 29 15:54:03 2024 From: prappo at openjdk.org (Pavel Rappo) Date: Wed, 29 May 2024 15:54:03 GMT Subject: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier [v2] In-Reply-To: References: <1tx_tl3PV2W5NCEXXawQY5V2ndnSOHPfjisypuhKdhA=.79840096-bac0-4da4-8102-c7ecea7cb5f0@github.com> Message-ID: On Wed, 29 May 2024 12:44:45 GMT, Pavel Rappo wrote: >> I don't care as long as microbenchmarks don't get a hiccup. > > @cl4es, here are some results from my machine (macosx-aarch64): > > Name (size) Cnt Base Error Test Error Unit Change > ArraysHashCode.bytes 1 15 0.715 ? 0.004 0.725 ? 0.029 ns/op 0.99x (p = 0.182 ) > ArraysHashCode.bytes 10 15 3.753 ? 0.024 3.747 ? 0.011 ns/op 1.00x (p = 0.322 ) > ArraysHashCode.bytes 100 15 69.731 ? 0.157 69.737 ? 0.092 ns/op 1.00x (p = 0.891 ) > ArraysHashCode.bytes 10000 15 9369.386 ? 1.449 9372.008 ? 6.678 ns/op 1.00x (p = 0.133 ) > ArraysHashCode.chars 1 15 0.719 ? 0.024 0.734 ? 0.024 ns/op 0.98x (p = 0.076 ) > ArraysHashCode.chars 10 15 3.744 ? 0.005 3.746 ? 0.004 ns/op 1.00x (p = 0.308 ) > ArraysHashCode.chars 100 15 69.741 ? 0.112 69.714 ? 0.044 ns/op 1.00x (p = 0.365 ) > ArraysHashCode.chars 10000 15 9367.123 ? 5.320 9371.325 ? 6.407 ns/op 1.00x (p = 0.046 ) > ArraysHashCode.ints 1 15 0.711 ? 0.013 0.706 ? 0.006 ns/op 1.01x (p = 0.137 ) > ArraysHashCode.ints 10 15 3.750 ? 0.002 3.752 ? 0.004 ns/op 1.00x (p = 0.283 ) > ArraysHashCode.ints 100 15 69.753 ? 0.086 69.711 ? 0.016 ns/op 1.00x (p = 0.065 ) > ArraysHashCode.ints 10000 15 9376.225 ? 5.845 9376.218 ? 12.181 ns/op 1.00x (p = 0.999 ) > ArraysHashCode.multibytes 1 15 0.741 ? 0.001 0.740 ? 0.001 ns/op 1.00x (p = 0.038 ) > ArraysHashCode.multibytes 10 15 2.737 ? 0.001 2.826 ? 0.136 ns/op 0.97x (p = 0.017 ) > ArraysHashCode.multibytes 100 15 32.202 ? 0.059 32.153 ? 0.006 ns/op 1.00x (p = 0.004*) > ArraysHashCode.multibytes 10000 15 4922.740 ? 25.590 4921.468 ? 7.372 ns/op 1.00x (p = 0.846 ) > ArraysHashCode.multichars 1 15 0.740 ? 0.005 0.740 ? 0.000 ns/op 1.00x (p = 0.996 ) > ArraysHashCode.multichars 10 15 2.732 ? 0.002 2.737 ? 0.003 ns/op 1.00x (p = 0.000*) > ArraysHashCode.multichars 100 15 32.109 ? 0.017 32.182 ? 0.028 ns/op 1.00x (p = 0.000*) > ArraysHashCode.multichars 10000 15 4925.750 ? 46.366 4930.684 ? 26.001 ns/op 1.00x (p = 0.704 ) > ArraysHashCode.multiints 1 15 0.740 ? 0.000 0.739 ? 0.000 ns/op 1.00x (p = 0.000*) > ArraysHashCode.multiints 10 15 2.919 ? 0.002 2.953 ? 0.059 ns/op 0.99x (p = 0.033 ) > ArraysHashCode.multiints 100 15 32.140 ? 0.011 32.094 ? 0.004 ns/op 1.00x (p = 0.000*) > ... Just to clarify, the above comparison is between master and https://github.com/openjdk/jdk/pull/19414/commits/534ff367bc50ec4150e4b206ce7203c7ff9f5cad. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19414#discussion_r1619125150 From prappo at openjdk.org Wed May 29 15:57:02 2024 From: prappo at openjdk.org (Pavel Rappo) Date: Wed, 29 May 2024 15:57:02 GMT Subject: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier [v2] In-Reply-To: References: <1tx_tl3PV2W5NCEXXawQY5V2ndnSOHPfjisypuhKdhA=.79840096-bac0-4da4-8102-c7ecea7cb5f0@github.com> Message-ID: On Wed, 29 May 2024 15:50:05 GMT, Pavel Rappo wrote: >> @cl4es, here are some results from my machine (macosx-aarch64): >> >> Name (size) Cnt Base Error Test Error Unit Change >> ArraysHashCode.bytes 1 15 0.715 ? 0.004 0.725 ? 0.029 ns/op 0.99x (p = 0.182 ) >> ArraysHashCode.bytes 10 15 3.753 ? 0.024 3.747 ? 0.011 ns/op 1.00x (p = 0.322 ) >> ArraysHashCode.bytes 100 15 69.731 ? 0.157 69.737 ? 0.092 ns/op 1.00x (p = 0.891 ) >> ArraysHashCode.bytes 10000 15 9369.386 ? 1.449 9372.008 ? 6.678 ns/op 1.00x (p = 0.133 ) >> ArraysHashCode.chars 1 15 0.719 ? 0.024 0.734 ? 0.024 ns/op 0.98x (p = 0.076 ) >> ArraysHashCode.chars 10 15 3.744 ? 0.005 3.746 ? 0.004 ns/op 1.00x (p = 0.308 ) >> ArraysHashCode.chars 100 15 69.741 ? 0.112 69.714 ? 0.044 ns/op 1.00x (p = 0.365 ) >> ArraysHashCode.chars 10000 15 9367.123 ? 5.320 9371.325 ? 6.407 ns/op 1.00x (p = 0.046 ) >> ArraysHashCode.ints 1 15 0.711 ? 0.013 0.706 ? 0.006 ns/op 1.01x (p = 0.137 ) >> ArraysHashCode.ints 10 15 3.750 ? 0.002 3.752 ? 0.004 ns/op 1.00x (p = 0.283 ) >> ArraysHashCode.ints 100 15 69.753 ? 0.086 69.711 ? 0.016 ns/op 1.00x (p = 0.065 ) >> ArraysHashCode.ints 10000 15 9376.225 ? 5.845 9376.218 ? 12.181 ns/op 1.00x (p = 0.999 ) >> ArraysHashCode.multibytes 1 15 0.741 ? 0.001 0.740 ? 0.001 ns/op 1.00x (p = 0.038 ) >> ArraysHashCode.multibytes 10 15 2.737 ? 0.001 2.826 ? 0.136 ns/op 0.97x (p = 0.017 ) >> ArraysHashCode.multibytes 100 15 32.202 ? 0.059 32.153 ? 0.006 ns/op 1.00x (p = 0.004*) >> ArraysHashCode.multibytes 10000 15 4922.740 ? 25.590 4921.468 ? 7.372 ns/op 1.00x (p = 0.846 ) >> ArraysHashCode.multichars 1 15 0.740 ? 0.005 0.740 ? 0.000 ns/op 1.00x (p = 0.996 ) >> ArraysHashCode.multichars 10 15 2.732 ? 0.002 2.737 ? 0.003 ns/op 1.00x (p = 0.000*) >> ArraysHashCode.multichars 100 15 32.109 ? 0.017 32.182 ? 0.028 ns/op 1.00x (p = 0.000*) >> ArraysHashCode.multichars 10000 15 4925.750 ? 46.366 4930.684 ? 26.001 ns/op 1.00x (p = 0.704 ) >> ArraysHashCode.multiints 1 15 0.740 ? 0.000 0.739 ? 0.000 ns/op 1.00x (p = 0.000*) >> ArraysHashCode.multiints 10 15 2.919 ? 0.002 2.953 ? 0.059 ns/op 0.99x (p = 0.033 ) >> ArraysHashCode.multiints 100 15 32.140 ? ... > > Just to clarify, the above comparison is between master and https://github.com/openjdk/jdk/pull/19414/commits/534ff367bc50ec4150e4b206ce7203c7ff9f5cad. > For clarity, if you think it helps: I did that and a little bit more in https://github.com/openjdk/jdk/pull/19414/commits/534ff367bc50ec4150e4b206ce7203c7ff9f5cad. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19414#discussion_r1619130669 From sgibbons at openjdk.org Wed May 29 16:55:14 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 29 May 2024 16:55:14 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v47] In-Reply-To: References: Message-ID: On Tue, 28 May 2024 23:52:27 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Move assert to where it's actually important. Thank you all for the comments. If there are no objections, I'll integrate these fixes tomorrow morning. I've run tier1-3 tests with the appropriate options on my machine with no errors, so my confidence is high. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2137861781 From burban at openjdk.org Wed May 29 16:58:02 2024 From: burban at openjdk.org (Bernhard Urban-Forster) Date: Wed, 29 May 2024 16:58:02 GMT Subject: RFR: 8331159: VM build without C2 fails after JDK-8180450 In-Reply-To: <86NGpx5VVOK6KuR1qbhLRS27zau-DEwXW31EakcquYY=.4d56d214-1a07-4ff5-a1af-e18a545ad725@github.com> References: <86NGpx5VVOK6KuR1qbhLRS27zau-DEwXW31EakcquYY=.4d56d214-1a07-4ff5-a1af-e18a545ad725@github.com> Message-ID: On Thu, 25 Apr 2024 20:54:23 GMT, Bernhard Urban-Forster wrote: > x86 bits are fine. @theRealAph could you have a look at this? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18962#issuecomment-2137869996 From dlong at openjdk.org Wed May 29 18:44:08 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 29 May 2024 18:44:08 GMT Subject: RFR: 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" [v3] In-Reply-To: References: <16J-lJ2AceGTVcRWBcP15yKcwO-1IA1XsngyOuNjf7k=.0776f081-ae2c-4279-87cf-d909806c2bc4@github.com> Message-ID: On Wed, 29 May 2024 08:46:51 GMT, Fei Gao wrote: >> On LP64 systems, if the heap can be moved into low virtual address space (below 4GB) and the heap size is smaller than the interesting threshold of 4 GB, we can use unscaled decoding pattern for narrow klass decoding. It means that a generic field reference can be decoded by: >> >> cast<64> (32-bit compressed reference) + field_offset >> >> >> When the `field_offset` is an immediate, on aarch64 platform, the unscaled decoding pattern can match perfectly with a direct addressing mode, i.e., `base_plus_offset`, supported by `LDR/STR` instructions. But for certain data width, not all immediates can be encoded in the instruction field of `LDR/STR` [[1]](https://github.com/openjdk/jdk/blob/8db7bad992a0f31de9c7e00c2657c18670539102/src/hotspot/cpu/aarch64/assembler_aarch64.inline.hpp#L33). The ranges are different as data widths vary. >> >> For example, when we try to load a value of long type at offset of `1030`, the address expression is `(AddP (DecodeN base) 1030)`. Before the patch, the expression was matching with `operand indOffIN()`. But, for 64-bit `LDR/STR`, signed immediate byte offset must be in the range -256 to 255 or positive immediate byte offset must be a multiple of 8 in the range 0 to 32760 [[2]](https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/LDR--immediate---Load-Register--immediate--?lang=en). `1030` can't be encoded in the instruction field. So, after matching, when we do checking for instruction encoding, the assertion would fail. >> >> In this patch, we're going to filter out invalid immediates when deciding if current addressing mode can be matched as `base_plus_offset`. We introduce `indOffIN4/indOffLN4` and `indOffIN8/indOffLN8` for 32-bit data type and 64-bit data type separately in the patch. E.g., for `memory4`, we remove the generic `indOffIN/indOffLN`, which matches wrong unscaled immediate range, and replace them with `indOffIN4/indOffLN4` instead. >> >> Since 8-bit and 16-bit `LDR/STR` instructions also support the unscaled decoding pattern, we add the addressing mode in the lists of `memory1` and `memory2` by introducing `indOffIN1/indOffLN1` and `indOffIN2/indOffLN2`. >> >> We also remove unused operands `indOffI/indOffl/indOffIN/indOffLN` to avoid misuse. >> >> Tier 1-3 passed on aarch64. > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Add the assertion back and merge matchrules with a better predicate > - Merge branch 'master' into fg8319690 > - Remove unused immIOffset/immLOffset > - Merge branch 'master' into fg8319690 > - 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" > > On LP64 systems, if the heap can be moved into low virtual > address space (below 4GB) and the heap size is smaller than the > interesting threshold of 4 GB, we can use unscaled decoding > pattern for narrow klass decoding. It means that a generic field > reference can be decoded by: > ``` > cast<64> (32-bit compressed reference) + field_offset > ``` > > When the `field_offset` is an immediate, on aarch64 platform, the > unscaled decoding pattern can match perfectly with a direct > addressing mode, i.e., `base_plus_offset`, supported by LDR/STR > instructions. But for certain data width, not all immediates can > be encoded in the instruction field of LDR/STR[1]. The ranges are > different as data widths vary. > > For example, when we try to load a value of long type at offset of > `1030`, the address expression is `(AddP (DecodeN base) 1030)`. > Before the patch, the expression was matching with > `operand indOffIN()`. But, for 64-bit LDR/STR, signed immediate > byte offset must be in the range -256 to 255 or positive immediate > byte offset must be a multiple of 8 in the range 0 to 32760[2]. > `1030` can't be encoded in the instruction field. So, after > matching, when we do checking for instruction encoding, the > assertion would fail. > > In this patch, we're going to filter out invalid immediates > when deciding if current addressing mode can be matched as > `base_plus_offset`. We introduce `indOffIN4/indOffLN4` and > `indOffIN8/indOffLN8` for 32-bit data type and 64-bit data > type separately in the patch. E.g., for `memory4`, we remove > the generic `indOffIN/indOffLN`, which matches wrong unscaled > immediate range, and replace them with `indOffIN4/indOffLN4` > instead. > > Since 8-bit and 16-bit LDR/STR instructions also support the > unscaled decoding pattern, we add the addressing mode in the > lists of `memory1` and `memory2` by introducing > `indOffIN1/indOffLN1` and `indOffIN2/... src/hotspot/cpu/aarch64/aarch64.ad line 5193: > 5191: constraint(ALLOC_IN_RC(ptr_reg)); > 5192: match(AddP reg off); > 5193: match(AddP (DecodeN regn) off); I'm surprised this works. If we match on "DecodeN regn", is it really safe to use $reg instead? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16991#discussion_r1619329588 From kbarrett at openjdk.org Wed May 29 20:06:07 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 29 May 2024 20:06:07 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v11] In-Reply-To: References: Message-ID: On Tue, 28 May 2024 15:57:19 GMT, Andrew Haley wrote: >> At the present time, `assert_different_registers()` uses an O(N**2) algorithm in assert_different_registers(). We can utilize RegSet to do it in O(N) time. This would be a useful optimization for all builds with assertions enabled. >> >> In addition, it would be useful to be able to static_assert different registers. >> >> Also, I've taken the opportunity to expand the maximum size of a RegSet to 64 on 64-bit platforms. >> >> I also fixed a bug: sometimes `noreg` is passed to `assert_different_registers()`, but it may only be passed once or a spurious assertion is triggered. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/asm/register.hpp > > Co-authored-by: Stefan Karlsson Changes requested by kbarrett (Reviewer). src/hotspot/cpu/x86/register_x86.hpp line 395: > 393: inline Register AbstractRegSet::first() { > 394: size_t first = _bitset & -_bitset; > 395: return first != 0 ? as_Register(exact_log2(first)) : noreg; This could instead be if (_bitset == 0) { return noreg; } return as_register(count_trailing_zeros(_bitset)); which would be consistent with how `last` is being calculated. Note that exact_log2 bottoms out in count_trailing_zeros. Similarly for the XMMRegister case below. src/hotspot/share/asm/register.hpp line 256: > 254: // Debugging support > 255: > 256: template Rx is unused. ------------- PR Review: https://git.openjdk.org/jdk/pull/16617#pullrequestreview-2086193112 PR Review Comment: https://git.openjdk.org/jdk/pull/16617#discussion_r1619355839 PR Review Comment: https://git.openjdk.org/jdk/pull/16617#discussion_r1619393053 From kbarrett at openjdk.org Wed May 29 20:06:08 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 29 May 2024 20:06:08 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v6] In-Reply-To: <3xKYuTm22oA-SeoXK20LuPypVkTVuQNM7C9kY_tKlgs=.04a0c1cf-691d-428c-9c12-78bc02cab6d0@github.com> References: <3xKYuTm22oA-SeoXK20LuPypVkTVuQNM7C9kY_tKlgs=.04a0c1cf-691d-428c-9c12-78bc02cab6d0@github.com> Message-ID: <3HRQd_dZXyl4uOar1HeZVvaehwpY4Iu1_JqHgulwlk4=.9e037d7b-f176-4da5-86d4-a80e3cd89f4d@github.com> On Fri, 10 May 2024 14:24:56 GMT, Andrew Haley wrote: >> src/hotspot/share/asm/register.hpp line 257: >> >>> 255: >>> 256: template >>> 257: inline constexpr bool different_registers(AbstractRegSet allocated_regs, R first_register) { >> >> different_registers is only used by debug-only code in assert_different_registers. Shouldn't all the overloads >> for different_registers be within an `#ifdef ASSERT` block? > > I could do so, but that would lose the ability to do `static_assert(different_registers(...`. I don't think that `static_assert` depends on `ASSERT`. I'm happy to make this patch debug-only, though, if you prefer. Good point about static_assert. Don't make it debug-only. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16617#discussion_r1619372620 From kbarrett at openjdk.org Wed May 29 20:06:10 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 29 May 2024 20:06:10 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v6] In-Reply-To: References: Message-ID: On Wed, 17 Jan 2024 07:03:37 GMT, Kim Barrett wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Review feedback. > > src/hotspot/share/asm/register.hpp line 257: > >> 255: >> 256: template >> 257: inline constexpr bool different_registers(AbstractRegSet allocated_regs, R first_register) { > > "inline" is redundant with "constexpr". "inline" is still redundant with "constexpr". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16617#discussion_r1619398153 From kvn at openjdk.org Wed May 29 21:39:14 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 29 May 2024 21:39:14 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v47] In-Reply-To: References: Message-ID: On Tue, 28 May 2024 23:52:27 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Move assert to where it's actually important. Let me test the latest version before integration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2138303300 From kvn at openjdk.org Wed May 29 21:44:17 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 29 May 2024 21:44:17 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v47] In-Reply-To: References: Message-ID: <-jpTM1HhjURGU9BNxceoaF1OlfoVla_Jlnj9BYVCOTQ=.088cff2a-eb4d-43a1-8072-4b688af1d244@github.com> On Tue, 28 May 2024 23:52:27 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Move assert to where it's actually important. test/jdk/TEST.ROOT line 103: > 101: vm.jvmti \ > 102: vm.cpu.features \ > 103: vm.compiler2.enabled \ `vm.compiler2.enabled ` already listed at line 91 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1619506711 From kvn at openjdk.org Wed May 29 22:07:00 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 29 May 2024 22:07:00 GMT Subject: RFR: 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer In-Reply-To: References: Message-ID: On Wed, 29 May 2024 09:34:32 GMT, Matthias Baesken wrote: > > val needs an unsigned type to avoid undefined behavior because of signed integer overflow. I'd use uintptr_t. > > Makes sense to use something unsigned. Any good place(s) where to put those templates? For now I would just simply put them into relocInfo.hpp (we can used them if we need to reuse them somewhere else) . I would suggest `utilities/globalDefinitions.hpp` somewhere near ` pointer_delta*()` ------------- PR Comment: https://git.openjdk.org/jdk/pull/19424#issuecomment-2138335817 From sgibbons at openjdk.org Wed May 29 22:20:31 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 29 May 2024 22:20:31 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v48] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Remove duplicate vm.compiler2.enabled ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/db0ab75a..ed06edd6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=47 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=46-47 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sgibbons at openjdk.org Wed May 29 22:20:31 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 29 May 2024 22:20:31 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v47] In-Reply-To: <-jpTM1HhjURGU9BNxceoaF1OlfoVla_Jlnj9BYVCOTQ=.088cff2a-eb4d-43a1-8072-4b688af1d244@github.com> References: <-jpTM1HhjURGU9BNxceoaF1OlfoVla_Jlnj9BYVCOTQ=.088cff2a-eb4d-43a1-8072-4b688af1d244@github.com> Message-ID: On Wed, 29 May 2024 21:41:42 GMT, Vladimir Kozlov wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Move assert to where it's actually important. > > test/jdk/TEST.ROOT line 103: > >> 101: vm.jvmti \ >> 102: vm.cpu.features \ >> 103: vm.compiler2.enabled \ > > `vm.compiler2.enabled ` already listed at line 91 Thanks! Removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1619532884 From kvn at openjdk.org Thu May 30 02:21:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 30 May 2024 02:21:13 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v48] In-Reply-To: References: Message-ID: On Wed, 29 May 2024 22:20:31 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Remove duplicate vm.compiler2.enabled My testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16753#pullrequestreview-2086978326 From dfenacci at openjdk.org Thu May 30 05:14:07 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 30 May 2024 05:14:07 GMT Subject: Integrated: 8333099: Missing check for is_LoadVector in StoreNode::Identity In-Reply-To: References: Message-ID: On Wed, 29 May 2024 08:54:35 GMT, Damon Fenacci wrote: > [JDK-8325520](https://bugs.openjdk.org/browse/JDK-8325520) introduced a check for type equality in `StoreNode::Identity` in the specific case of a load vector followed by a store vector. > Unfortunately the memory node operand might actually not be of type `LoadVector`. So, before retrieving its type, a check for `is_LoadVector` is necessary. This pull request has now been integrated. Changeset: 2ea365c9 Author: Damon Fenacci Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/2ea365c94533a59865ab4c20ad8e1008072278da Stats: 64 lines in 2 files changed: 63 ins; 0 del; 1 mod 8333099: Missing check for is_LoadVector in StoreNode::Identity Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/19449 From thartmann at openjdk.org Thu May 30 05:47:01 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 30 May 2024 05:47:01 GMT Subject: RFR: 8332487: Regression in Crypto-AESGCMBench.encrypt (and others) after JDK-8328181 In-Reply-To: References: Message-ID: On Wed, 29 May 2024 07:49:21 GMT, Jatin Bhateja wrote: > Re-instantiating the ClearArray opcode check in match_rule_supported_vector, this caused performance regressions in some worklets in Renaissance BM since it prevented small sized instance initialization using quadword stores which showed better performance on non-AVX512 targets. > > Our intent was to save code bloating due to long sequences of quadword store with large InitArrayShortSize value to prevent any side effects on in-lining decisions. Performance of an existing [Benchmark](https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/vm/compiler/ClearMemory.java) does not show much performance variation. > > > Baseline with -XX:InitArrayShortSize=100000000 > > Benchmark Mode Cnt Score Error Units > ClearMemory.testClearMemory16K thrpt 2 2695259.360 ops/s > ClearMemory.testClearMemory1K thrpt 2 48622330.474 ops/s > ClearMemory.testClearMemory1M thrpt 2 79546.779 ops/s > ClearMemory.testClearMemory24B thrpt 2 252740278.617 ops/s > ClearMemory.testClearMemory2K thrpt 2 24781443.547 ops/s > ClearMemory.testClearMemory32B thrpt 2 251588987.342 ops/s > ClearMemory.testClearMemory32K thrpt 2 1487427.378 ops/s > ClearMemory.testClearMemory40B thrpt 2 213856093.091 ops/s > ClearMemory.testClearMemory48B thrpt 2 193701317.101 ops/s > ClearMemory.testClearMemory4K thrpt 2 11961450.919 ops/s > ClearMemory.testClearMemory56B thrpt 2 169003238.018 ops/s > ClearMemory.testClearMemory8K thrpt 2 5871416.239 ops/s > ClearMemory.testClearMemory8M thrpt 2 10663.044 ops/s > > > With patch and -XX:InitArrayShortSize=100000000 > > Benchmark Mode Cnt Score Error Units > ClearMemory.testClearMemory16K thrpt 2 3147203.987 ops/s > ClearMemory.testClearMemory1K thrpt 2 48225184.981 ops/s > ClearMemory.testClearMemory1M thrpt 2 80016.400 ops/s > ClearMemory.testClearMemory24B thrpt 2 253904943.981 ops/s > ClearMemory.testClearMemory2K thrpt 2 24664594.490 ops/s > ClearMemory.testClearMemory32B thrpt 2 255507231.954 ops/s > ClearMemory.testClearMemory32K thrpt 2 1636220.531 ops/s > ClearMemory.testClearMemory40B thrpt 2 220718255.832 ops/s > ClearMemory.testClearMemory48B thrpt 2 196294911.715 ops/s > ClearMemory.test... Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19447#pullrequestreview-2087279575 From thartmann at openjdk.org Thu May 30 06:00:01 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 30 May 2024 06:00:01 GMT Subject: RFR: 8332499: Gtest codestrings.validate_vm fail on linux x64 when hsdis is present [v5] In-Reply-To: References: Message-ID: <9LTbEdpkxdGpGGhaEafoyaC6bajqzeQ51PhdThpoQuk=.1ae9f691-4f88-44f9-958b-3e4d6cf7f8de@github.com> On Tue, 28 May 2024 15:47:25 GMT, SendaoYan wrote: >> Hi all, >> There's some arch-specific code to trim trailing entries as descripted in [JDK-8332499](https://bugs.openjdk.org/browse/JDK-8332499). Only change the gtest testcase, the risk is low. >> >> On linux x86_64, before this PR, after deal with `std::regex_replace(tmp4, std::regex("\\s+:\\s+hlt[ \\t]+(?!\\n\\s+;;)"), "")`, the output differents because the first output has trailing empty spaces, show as below: >> >> - : nop >> + : nop >> >> So we need to delete the empty spaces after `: nop` use `std::regex_replace(tmp5, std::regex("(\\s+:\\s+nop)[ \\t]*"), "$1")` >> >> >> Additional test: >> - [x] codestrings.validate_vm on linux x64 >> - [x] codestrings.validate_vm on linux aarch64 >> - [x] codestrings.validate_vm on linux riscv64 > > SendaoYan has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'jbs8332499' of github.com:sendaoYan/jdk-ysd into jbs8332499 > - delete the empty spaces after : nop Thanks for the details. So this is similar to [JDK-8274039](https://bugs.openjdk.org/browse/JDK-8274039). Do you know why this only shows up now? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19309#issuecomment-2138737402 From chagedorn at openjdk.org Thu May 30 06:06:04 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 30 May 2024 06:06:04 GMT Subject: RFR: 8332905: C2 SuperWord: bad AD file, with RotateRightV and first operand not a pack In-Reply-To: References: Message-ID: <6aPEF71_MRIem_i4zRtOhlXxuOdRhIe5ts1vniXR9Zs=.8359afe7-44fa-4669-b389-f96b87bb9ee3@github.com> On Wed, 29 May 2024 07:20:33 GMT, Emanuel Peter wrote: > I just discovered this bug by manual code inspection, and found a reproducer. > > It seems to be a regression of [JDK-8248830](https://bugs.openjdk.org/browse/JDK-8248830), that is when RotateRightV was added to SuperWord. > > The problem is that we directly get the input node, rather than the `vector_opd`, which fails if that input is not a vector already, but for example a `PopulateIndex` pattern that is only vectorized when calling `vector_opd`. > > Before this patch: it looks like this: > > } else if (VectorNode::is_scalar_rotate(n)) { > Node* in1 = first->in(1); > Node* in2 = first->in(2); > > > But at least `in1` should be using `vector_opd`, like most other ops: > > ` Node* in1 = vector_opd(p, 1);` > > When the input is a `PopulateIndex` pattern, then `first->in(1) `gives us the iv-phi, which is a scalar. `vector_opd` would produce a `PopulateIndex` vector. > > In the ad-file, we get an error, because we do not expect a scalar as the first operand of the RotateRightV, but a vector. Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19445#pullrequestreview-2087302857 From thartmann at openjdk.org Thu May 30 06:26:04 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 30 May 2024 06:26:04 GMT Subject: RFR: 8332905: C2 SuperWord: bad AD file, with RotateRightV and first operand not a pack In-Reply-To: References: Message-ID: On Wed, 29 May 2024 07:20:33 GMT, Emanuel Peter wrote: > I just discovered this bug by manual code inspection, and found a reproducer. > > It seems to be a regression of [JDK-8248830](https://bugs.openjdk.org/browse/JDK-8248830), that is when RotateRightV was added to SuperWord. > > The problem is that we directly get the input node, rather than the `vector_opd`, which fails if that input is not a vector already, but for example a `PopulateIndex` pattern that is only vectorized when calling `vector_opd`. > > Before this patch: it looks like this: > > } else if (VectorNode::is_scalar_rotate(n)) { > Node* in1 = first->in(1); > Node* in2 = first->in(2); > > > But at least `in1` should be using `vector_opd`, like most other ops: > > ` Node* in1 = vector_opd(p, 1);` > > When the input is a `PopulateIndex` pattern, then `first->in(1) `gives us the iv-phi, which is a scalar. `vector_opd` would produce a `PopulateIndex` vector. > > In the ad-file, we get an error, because we do not expect a scalar as the first operand of the RotateRightV, but a vector. Looks good to me too. Fix version was still set to JDK 24 (the bot now warns about this as well: "The fixVersion in this issue is [24] but the fixVersion in .jcheck/conf is 23, a new backport will be created when this pr is integrated."). I set it back to JDK 23. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19445#pullrequestreview-2087332545 From syan at openjdk.org Thu May 30 06:26:05 2024 From: syan at openjdk.org (SendaoYan) Date: Thu, 30 May 2024 06:26:05 GMT Subject: RFR: 8332499: Gtest codestrings.validate_vm fail on linux x64 when hsdis is present [v5] In-Reply-To: <9LTbEdpkxdGpGGhaEafoyaC6bajqzeQ51PhdThpoQuk=.1ae9f691-4f88-44f9-958b-3e4d6cf7f8de@github.com> References: <9LTbEdpkxdGpGGhaEafoyaC6bajqzeQ51PhdThpoQuk=.1ae9f691-4f88-44f9-958b-3e4d6cf7f8de@github.com> Message-ID: On Thu, 30 May 2024 05:56:56 GMT, Tobias Hartmann wrote: > Thanks for the details. So this is similar to [JDK-8274039](https://bugs.openjdk.org/browse/JDK-8274039). Do you know why this only shows up now? I think this failure has existed for a while, not just recently. This failure reproduce need two conditions, 1. fastdebug or slowdebug linux x64 binary type; 2. run the gtest with hsdis-amd64.so is present ------------- PR Comment: https://git.openjdk.org/jdk/pull/19309#issuecomment-2138769207 From thartmann at openjdk.org Thu May 30 06:28:20 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 30 May 2024 06:28:20 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v48] In-Reply-To: References: Message-ID: On Wed, 29 May 2024 22:20:31 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Remove duplicate vm.compiler2.enabled Control question: Are we confident with this potentially going into JDK 23 or should we rather postpone to JDK 24? The fork is next week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2138771509 From epeter at openjdk.org Thu May 30 06:28:21 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 06:28:21 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v48] In-Reply-To: References: Message-ID: On Wed, 29 May 2024 22:20:31 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Remove duplicate vm.compiler2.enabled test/jdk/java/lang/String/IndexOf.java line 35: > 33: * @requires vm.cpu.features ~= ".*avx2.*" > 34: * @requires vm.compiler2.enabled > 35: * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions -Xcomp -XX:-TieredCompilation -XX:UseAVX=2 -XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts IndexOf Same here: why is the test AVX2 specific? Could other platforms not also be "tickled" in interesting ways with this test? test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 29: > 27: * @requires vm.cpu.features ~= ".*avx2.*" > 28: * @requires vm.compiler2.enabled > 29: * @run main/othervm -XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts -XX:UseAVX=2 -Xbatch -XX:-TieredCompilation -XX:CompileCommand=dontinline,ECoreIndexOf.indexOfKernel ECoreIndexOf Does this test really need to be `avx2` specific? Does it even need to be C2 specific? Or can this run on all platforms? test/jdk/java/lang/StringBuffer/IndexOf.java line 188: > 186: } > 187: > 188: } It looks like you just indented basically the whole file by 1 space. Why? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620019084 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620016717 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620013302 From epeter at openjdk.org Thu May 30 06:28:21 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 06:28:21 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v48] In-Reply-To: References: Message-ID: <_0H1QRaXnFyO9eGa7IvO1l4ZzNK_27D59ebYAphp8eg=.0fe38944-0b61-4a1a-b63d-04315b02117f@github.com> On Thu, 30 May 2024 06:21:36 GMT, Emanuel Peter wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove duplicate vm.compiler2.enabled > > test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 29: > >> 27: * @requires vm.cpu.features ~= ".*avx2.*" >> 28: * @requires vm.compiler2.enabled >> 29: * @run main/othervm -XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts -XX:UseAVX=2 -Xbatch -XX:-TieredCompilation -XX:CompileCommand=dontinline,ECoreIndexOf.indexOfKernel ECoreIndexOf > > Does this test really need to be `avx2` specific? Does it even need to be C2 specific? > Or can this run on all platforms? Would be a shame to spend so much time on writing a test and then not apply it everywhere ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620017891 From epeter at openjdk.org Thu May 30 06:33:35 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 06:33:35 GMT Subject: RFR: 8325155: C2 SuperWord: remove alignment boundaries [v6] In-Reply-To: References: Message-ID: On Wed, 29 May 2024 07:16:44 GMT, Emanuel Peter wrote: >> I have tried for a very long time to get rid of all the `alignment(n)` code that is all over the SuperWord code. With lots of previous work, I am now finally ready to remove it. >> >> I was able to remove lots of VM code, about 300 lines. And the removed code is I think much more complicated than the new code. >> >> This is what I did in this PR: >> - Removal of `_node_info`: used to have many fields, which I refactored out to the `VLoopAnalyzer` modules. `alignment` is the last component, which I now remove. >> - Changed the implementation of `SuperWord::find_adjacent_refs`, now `SuperWord::find_adjacent_memop_pairs`, completely: >> - It used to be an algorithm that would scan over all `memops` repeatedly, try to find some `mem_ref` and see which other memops were comparable, and then pack pairs for all of those, by comparing all-vs-all memops. This algorithm is at least quadratic, if not much worse. >> - I now add all `memops` into a single array, sort them by groups (those that are comparable with each other and could be packed into vectors), and inside the groups by ascending offset. This allows me to split off the groups much more efficiently, and also the sorting by offset allows me finding adjacent pairs much more efficiently. In the most cases this reduces the cost to `O(n log n)` for sort, and a linear scan for finding adjacent memops. >> - I removed the "alignment boundaries" created in `SuperWord::memory_alignment` by `int off_rem = offset % vw;`. >> - This used to have the effect that all offsets were computed modulo the vector width. Hence, pairs could not be packed across this boundary (e.g. we have nodes with offsets `31, 32`, which are adjacent in theory, but if we have a `vw = 32`, then the modulo-offsets are `31, 0`, and they are not detected as adjacent). >> - These "alignment boundaries" used to be required for correctness about a year ago, before I fixed and relaxed much of the alignment code. >> - The `alignment` used to have another important task: Ensuring compatibility of the input-size of a use node, with the output-size of the def-node. >> - This was done by giving all nodes an `alignment`, even the non-memop nodes. This `alignment` was then scaled up and down at type casts (e.g. int `0, 4, 8, 12` -> long `0, 8, 16, 24`). If the output-size of the def-node did not match the input-size of the use-node, then the `alignment` would not match up, and we would not pack. >> - This is why we used to have checks like `alignment(s1) + da... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/superword.cpp > > Co-authored-by: Christian Hagedorn FYI: I ran performance benchmarking, and there was no significant difference. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18822#issuecomment-2138777893 From thartmann at openjdk.org Thu May 30 06:43:02 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 30 May 2024 06:43:02 GMT Subject: RFR: 8332499: Gtest codestrings.validate_vm fail on linux x64 when hsdis is present [v5] In-Reply-To: References: Message-ID: <2FtrN8XAIXaZXdWlvL1pM_g3HR8mouo1ARr0qlTjr40=.660cdc0d-6760-4f5a-9835-4bec1da027fb@github.com> On Tue, 28 May 2024 15:47:25 GMT, SendaoYan wrote: >> Hi all, >> There's some arch-specific code to trim trailing entries as descripted in [JDK-8332499](https://bugs.openjdk.org/browse/JDK-8332499). Only change the gtest testcase, the risk is low. >> >> On linux x86_64, before this PR, after deal with `std::regex_replace(tmp4, std::regex("\\s+:\\s+hlt[ \\t]+(?!\\n\\s+;;)"), "")`, the output differents because the first output has trailing empty spaces, show as below: >> >> - : nop >> + : nop >> >> So we need to delete the empty spaces after `: nop` use `std::regex_replace(tmp5, std::regex("(\\s+:\\s+nop)[ \\t]*"), "$1")` >> >> >> Additional test: >> - [x] codestrings.validate_vm on linux x64 >> - [x] codestrings.validate_vm on linux aarch64 >> - [x] codestrings.validate_vm on linux riscv64 > > SendaoYan has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'jbs8332499' of github.com:sendaoYan/jdk-ysd into jbs8332499 > - delete the empty spaces after : nop Okay. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19309#pullrequestreview-2087363425 From bkilambi at openjdk.org Thu May 30 07:24:09 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 30 May 2024 07:24:09 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v8] In-Reply-To: References: <8-_t7nWbR9gZ2_QkfFNuf5M0Q4PMkKJKgwS3ZbHcCxI=.32dc4f11-dec5-468d-afc8-3b4dae285dcb@github.com> <2y-Ag6MxVDJfYl6kM0FYjQA-kzSCekUgAMWAZmkECyQ=.2a2a0a8e-fc67-42a4-bd67-b4ae3b60bcea@github.com> Message-ID: On Mon, 13 May 2024 11:01:30 GMT, Emanuel Peter wrote: >> @eme64 Thanks for the clarification. I understand the usage of `counts` in the IR tests. Just that I got a bit confused by some of your earlier statements. We do actually have a test to make sure AddReductionVF/VD and MulReductionVF/VD are not generated on aarch64 NEON machines - `test/hotspot/jtreg/compiler/c2/irTests/TestDisableAutoVectOpcodes.java`. I can modify this test to include UseSVE > 0 case as well and will also add a separate JTREG test for the VectorAPI tests. Hope that's ok.. > > @Bhavana-Kilambi > I know we have the tests in `test/hotspot/jtreg/compiler/c2/irTests/TestDisableAutoVectOpcodes.java`, and some other reduction tests. But these do not do the specific think I would like to see. > > I would like this: > - Add `no_strict_order` vs `requires_strict_order` or similar to `dump_spec`. > - IR match not just that there is the correct `ReductionNode`, but also that it has the `no_strict_order` or `requires_strict_order` in its dump. You can do that by using a custom regex string, rather than `IRNode.STORE_VECTOR` or similar. > - Then, create different tests, some where we expect ordered, some unordered vectors. Use Vector API and SuperWord examples. > > Does that make sense? Hi @eme64 , thanks for your feedback. Apologies for the delay in responding. I just got back from a six day leave. I will address your comments and upload a new patch soon. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18034#issuecomment-2138847978 From syan at openjdk.org Thu May 30 07:31:01 2024 From: syan at openjdk.org (SendaoYan) Date: Thu, 30 May 2024 07:31:01 GMT Subject: RFR: 8332499: Gtest codestrings.validate_vm fail on linux x64 when hsdis is present [v5] In-Reply-To: References: Message-ID: On Tue, 28 May 2024 15:47:25 GMT, SendaoYan wrote: >> Hi all, >> There's some arch-specific code to trim trailing entries as descripted in [JDK-8332499](https://bugs.openjdk.org/browse/JDK-8332499). Only change the gtest testcase, the risk is low. >> >> On linux x86_64, before this PR, after deal with `std::regex_replace(tmp4, std::regex("\\s+:\\s+hlt[ \\t]+(?!\\n\\s+;;)"), "")`, the output differents because the first output has trailing empty spaces, show as below: >> >> - : nop >> + : nop >> >> So we need to delete the empty spaces after `: nop` use `std::regex_replace(tmp5, std::regex("(\\s+:\\s+nop)[ \\t]*"), "$1")` >> >> >> Additional test: >> - [x] codestrings.validate_vm on linux x64 >> - [x] codestrings.validate_vm on linux aarch64 >> - [x] codestrings.validate_vm on linux riscv64 > > SendaoYan has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'jbs8332499' of github.com:sendaoYan/jdk-ysd into jbs8332499 > - delete the empty spaces after : nop > Okay. Looks good to me. > > /reviewers 2 Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19309#issuecomment-2138859712 From bkilambi at openjdk.org Thu May 30 08:19:13 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 30 May 2024 08:19:13 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v10] In-Reply-To: <6M8hC17XmxLvDhhtGKgKxTAwfT8NV8_ameppOeyI9jQ=.f942d480-efbb-49db-9d7c-5ec93fb8f1c4@github.com> References: <26UiEE_uEKUU0lg_T91K-b4Or3mtGluJYybbJOpETOU=.a74004d6-590f-49e7-8880-4ab6627926dd@github.com> <6M8hC17XmxLvDhhtGKgKxTAwfT8NV8_ameppOeyI9jQ=.f942d480-efbb-49db-9d7c-5ec93fb8f1c4@github.com> Message-ID: On Fri, 24 May 2024 08:45:03 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/vectorapi/TestVectorAddMulReduction.java line 42: >> >>> 40: * @bug 8320725 >>> 41: * @library /test/lib / >>> 42: * @requires os.arch == "aarch64" >> >> I think there is no reason to only run the test on aarch64. We can run the test anywhere, but the applyIf specifies on what platforms the IR rules are executed. > > So you can use the `asimd` or `avx...` features for that. Yes, thanks. I will add the `applyIf...` rules instead. I added this line as I thought it would be easier to extend it to other platforms like x86 by including x86 in this line `@requires os.arch == "aarch64" | os.arch == "x86_64"` so that the @IR rules need not be changed or new ones need to be added but I guess your suggestion is better as the test can still be run on x86 (and not apply the IR rules instead). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1620223066 From redestad at openjdk.org Thu May 30 08:38:02 2024 From: redestad at openjdk.org (Claes Redestad) Date: Thu, 30 May 2024 08:38:02 GMT Subject: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier [v2] In-Reply-To: References: <1tx_tl3PV2W5NCEXXawQY5V2ndnSOHPfjisypuhKdhA=.79840096-bac0-4da4-8102-c7ecea7cb5f0@github.com> Message-ID: On Wed, 29 May 2024 09:18:51 GMT, Pavel Rappo wrote: >> The non-constant test was added because that very bailout caused a crash. The other test is actually less interesting since it'll likely be covered indirectly by regular use. But as we are hiding these away this gets ever more obscure and perhaps the test could be dropped entirely. > > @cl4es, do you want me to delete that test file altogether? I thought you verified that the non-constant type test still provoke a crash (on x86) if you back out the code changes in https://github.com/openjdk/jdk/commit/969f6a37e4649079c7acea1952f5537fd9ba2f0a ? If so that test is still somewhat useful to guard against future coding mistakes by verifying that the bail out doesn't mess things up. The constant type tests have less utility, perhaps. I'd keep it as is unless there's a strong desire to reduce test runtime (these should be pretty quick). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19414#discussion_r1620259115 From aph at openjdk.org Thu May 30 08:41:02 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 30 May 2024 08:41:02 GMT Subject: RFR: 8331159: VM build without C2 fails after JDK-8180450 In-Reply-To: <86NGpx5VVOK6KuR1qbhLRS27zau-DEwXW31EakcquYY=.4d56d214-1a07-4ff5-a1af-e18a545ad725@github.com> References: <86NGpx5VVOK6KuR1qbhLRS27zau-DEwXW31EakcquYY=.4d56d214-1a07-4ff5-a1af-e18a545ad725@github.com> Message-ID: <2uFnYqkfGW5_J4Xx_7HBsnp6OlwC0mSq5_yif3jw-pE=.dbd2bede-590f-4895-97f2-4c25ed180633@github.com> On Thu, 25 Apr 2024 20:54:23 GMT, Bernhard Urban-Forster wrote: > x86 bits are fine. Sure, it's fine. I have the same change in [8331658](https://github.com/openjdk/jdk/pull/19426/files#diff-9112056f732229b18fec48fb0b20a3fe824de49d0abd41fbdb4202cfe70ad114), but you can push this if you like and I'll take that change out. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18962#pullrequestreview-2087640286 From prappo at openjdk.org Thu May 30 08:44:05 2024 From: prappo at openjdk.org (Pavel Rappo) Date: Thu, 30 May 2024 08:44:05 GMT Subject: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier [v2] In-Reply-To: References: <1tx_tl3PV2W5NCEXXawQY5V2ndnSOHPfjisypuhKdhA=.79840096-bac0-4da4-8102-c7ecea7cb5f0@github.com> Message-ID: On Thu, 30 May 2024 08:34:59 GMT, Claes Redestad wrote: >> @cl4es, do you want me to delete that test file altogether? > > I thought you verified that the non-constant type test still provoke a crash (on x86) if you back out the code changes in https://github.com/openjdk/jdk/commit/969f6a37e4649079c7acea1952f5537fd9ba2f0a ? If so that test is still somewhat useful to guard against future coding mistakes by verifying that the bail out doesn't mess things up. The constant type tests have less utility, perhaps. I'd keep it as is unless there's a strong desire to reduce test runtime (these should be pretty quick). I did verify it. Sorry, momentary lapse of the ability to reason. I'll integrate this PR very shortly then. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19414#discussion_r1620268479 From prappo at openjdk.org Thu May 30 09:36:06 2024 From: prappo at openjdk.org (Pavel Rappo) Date: Thu, 30 May 2024 09:36:06 GMT Subject: Integrated: 8332826: Make hashCode methods in ArraysSupport friendlier In-Reply-To: References: Message-ID: On Mon, 27 May 2024 16:28:31 GMT, Pavel Rappo wrote: > Please review this PR, which supersedes a now withdrawn https://github.com/openjdk/jdk/pull/14831. > > This PR replaces `ArraysSupport.vectorizedHashCode` with a set of more user-friendly methods. Here's a summary: > > - Made the operand constants (i.e. `T_BOOLEAN` and friends) and the `vectorizedHashCode` method private > > - Made the `vectorizedHashCode` method private, but didn't rename it. Renaming would dramatically increase this PR review cost, because that method's name is used by a lot of VM code. On a bright side, since the method is now private, it's no longer callable by clients of `ArraysSupport`, thus a problem of an inaccurate name is less severe. > > - Made the `ArraysSupport.utf16HashCode` method private > > - Moved tiny cases (i.e. 0, 1, 2) to `ArraysSupport` This pull request has now been integrated. Changeset: 3cff588a Author: Pavel Rappo URL: https://git.openjdk.org/jdk/commit/3cff588a3104aa5224e7236eb2c2bb5852de9202 Stats: 266 lines in 13 files changed: 186 ins; 32 del; 48 mod 8332826: Make hashCode methods in ArraysSupport friendlier Reviewed-by: redestad, liach ------------- PR: https://git.openjdk.org/jdk/pull/19414 From gcao at openjdk.org Thu May 30 10:41:23 2024 From: gcao at openjdk.org (Gui Cao) Date: Thu, 30 May 2024 10:41:23 GMT Subject: RFR: 8333248: VectorGatherMaskFoldingTest.java failed when maximum vector bits is 64 Message-ID: Hi, VectorGatherMaskFoldingTest.java Test fails when max vector bits is 64, when max vector bits is 64, LongVector.SPECIES_MAX.length() and DoubleVector.SPECIES_MAX.length() is 1. Run VectorGatherMaskFoldingTest.java on aarch64 client mode without `-XX:+IncrementalInlineForceCleanup` Option, the `-XX:+IncrementalInlineForceCleanup` is C2 Option, so we need to remove this Option from the VectorGatherMaskFoldingTest.main method. error message: Base Test: @Test testDoubleVectorStoreLoadMaskedVector: compiler.lib.ir_framework.shared.TestRunException: There was an error while invoking @Test method public static void compiler.vectorapi.VectorGatherMaskFoldingTest.testDoubleVectorStoreLoadMaskedVector(). Target: null. Arguments: at compiler.lib.ir_framework.test.BaseTest.invokeTestMethod(BaseTest.java:84) at compiler.lib.ir_framework.test.BaseTest.invokeTest(BaseTest.java:71) at compiler.lib.ir_framework.test.AbstractTest.run(AbstractTest.java:98) at compiler.lib.ir_framework.test.TestVM.runTests(TestVM.java:861) at compiler.lib.ir_framework.test.TestVM.start(TestVM.java:252) at compiler.lib.ir_framework.test.TestVM.main(TestVM.java:165) Caused by: java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:118) at java.base/java.lang.reflect.Method.invoke(Method.java:580) at compiler.lib.ir_framework.test.BaseTest.invokeTestMethod(BaseTest.java:80) ... 5 more Caused by: java.lang.RuntimeException: assertNotEquals: expected [1.0] to not equal [1.0] at jdk.test.lib.Asserts.fail(Asserts.java:691) at jdk.test.lib.Asserts.assertNotEquals(Asserts.java:451) at jdk.test.lib.Asserts.assertNotEquals(Asserts.java:435) at compiler.vectorapi.VectorGatherMaskFoldingTest.testDoubleVectorStoreLoadMaskedVector(VectorGatherMaskFoldingTest.java:1089) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) ... 7 more For example, the following method will be failed: private static final VectorSpecies L_SPECIES = LongVector.SPECIES_MAX; private static final VectorSpecies D_SPECIES = DoubleVector.SPECIES_MAX; ... @Test @IR(counts = { IRNode.STORE_VECTOR_MASKED, ">= 1", IRNode.LOAD_VECTOR_MASKED, ">= 1" }, applyIfCPUFeatureOr = {"avx512", "true", "sve", "true"}) public static void testDoubleVectorStoreLoadMaskedVector() { double[] res = new double[D_SPECIES.length()]; doubleVector.intoArray(res, 0, doubleVectorMask); DoubleVector res2 = DoubleVector.fromArray(D_SPECIES, res, 0, doubleVectorMask); Asserts.assertNotEquals(res2, doubleVector); } in this `testDoubleVectorStoreLoadMaskedVector` test case, the doubleVector data is:[1.0], doubleVectorMask:[true], res2 is:[1.0] So here `Asserts.assertNotEquals(res2, doubleVector);` will assert Error. By the way, LongVector.SPECIES_MAX/ DoubleVector.SPECIES_MAX is initialized with a call to VectorShape.getMaxVectorBitSize. the aarch64 client jvm mode, VectorShape.getMaxVectorBitSize will return the default 64 bit, and If any CPU does not support vectors like riscv without rvv1.0, the default value of 64 is returned. /** * Returns the maximum vector bit size for a given element type. * * @param etype the element type. * @return the maximum vector bit. */ /*package-private*/ static int getMaxVectorBitSize(Class etype) { // VectorSupport.getMaxLaneCount may return -1 if C2 is not enabled, // or a value smaller than the S_64_BIT.vectorBitSize / elementSizeInBits if MaxVectorSize < 16 // If so default to S_64_BIT int maxLaneCount = VectorSupport.getMaxLaneCount(etype); int elementSizeInBits = LaneType.of(etype).elementSize; return Math.max(maxLaneCount * elementSizeInBits, S_64_BIT.vectorBitSize); } The fix, if when max vector bits is 64, which means there is no vector api implementation, we set L_SPECIES to LongVector.SPECIES_128, which will use the vector api default java level implementation. ### Testing - [x] Run VectorGatherMaskFoldingTest.java on Banana Pi BPI-F3 board (has RVV1.0) - [x] Run VectorGatherMaskFoldingTest.java on aarch64 server mode with neon - [x] Run VectorGatherMaskFoldingTest.java on aarch64 client mode without `-XX:+IncrementalInlineForceCleanup` Option ------------- Commit messages: - 8333248: VectorGatherMaskFoldingTest.java failed when maximum vector bits is 64 Changes: https://git.openjdk.org/jdk/pull/19473/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19473&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333248 Stats: 5 lines in 1 file changed: 3 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19473.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19473/head:pull/19473 PR: https://git.openjdk.org/jdk/pull/19473 From bkilambi at openjdk.org Thu May 30 11:55:07 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 30 May 2024 11:55:07 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v10] In-Reply-To: References: <26UiEE_uEKUU0lg_T91K-b4Or3mtGluJYybbJOpETOU=.a74004d6-590f-49e7-8880-4ab6627926dd@github.com> Message-ID: On Fri, 24 May 2024 08:45:56 GMT, Emanuel Peter wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Modify JTREG IR rules and some style/format changes > > test/hotspot/jtreg/compiler/vectorapi/TestVectorAddMulReduction.java line 181: > >> 179: >> 180: public static void main(String[] args) { >> 181: TestFramework.runWithFlags("-XX:-TieredCompilation", > > Why `-XX:-TieredCompilation`? I added that to enable c2 compilation but I just realized that this flag is whitelisted already and maybe not needed here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1620563597 From bkilambi at openjdk.org Thu May 30 12:04:10 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 30 May 2024 12:04:10 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v10] In-Reply-To: References: <26UiEE_uEKUU0lg_T91K-b4Or3mtGluJYybbJOpETOU=.a74004d6-590f-49e7-8880-4ab6627926dd@github.com> Message-ID: On Fri, 24 May 2024 08:42:51 GMT, Emanuel Peter wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Modify JTREG IR rules and some style/format changes > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorFPReduction.java line 67: > >> 65: @IR(applyIfCPUFeatureAnd = {"asimd", "true", "sve", "false"}, failOn = {IRNode.ADD_REDUCTION_VD}) >> 66: @IR(applyIfCPUFeature = {"sve", "true"}, counts = {"requires_strict_order", ">=1", IRNode.ADD_REDUCTION_VD, ">=1"}, >> 67: failOn = {"no_strict_order"}, phase = CompilePhase.PRINT_IDEAL) > > Also: I realize that you only check for `asimd / sve` features. Can you also apply it for avx features? I am not sure which specific avx/sse features/versions support these operations. Is it ok to add - `applyIfCPUFeatureOr = {"sve", "true", "avx", "true"}` ? or maybe anyone with x86 knowledge who would work on adding strict ordering conditions in their backend (after this patch is merged) can also modify these two testcases accordingly? Also in the vectorapi case, I have included only specific shapes of Vectors where the Reduction nodes would be generated on aarch64. It may or may not be the case for x86 and they might have to add more tests accordingly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1620576664 From fyang at openjdk.org Thu May 30 12:16:01 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 30 May 2024 12:16:01 GMT Subject: RFR: 8333154: RISC-V: Add support for primitive array C1 clone intrinsic In-Reply-To: References: Message-ID: On Wed, 29 May 2024 08:23:39 GMT, Gui Cao wrote: > Implementation of primitive array C1 clone intrinsic (https://bugs.openjdk.org/browse/JDK-8333154) for linux-riscv64. > > ### Correctness testing: > - [x] Run make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" (fastdebug) > - [x] Run tier1-3 tests on SOPHON SG2042 (release) > > ### Performance testing: > Without Patch: > > make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 90.089 ? 7.122 ns/op > ArrayClone.byteArraycopy 10 avgt 15 146.000 ? 11.761 ns/op > ArrayClone.byteArraycopy 100 avgt 15 289.382 ? 23.903 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 767.864 ? 56.721 ns/op > ArrayClone.byteClone 0 avgt 15 735.692 ? 26.641 ns/op > ArrayClone.byteClone 10 avgt 15 810.810 ? 34.563 ns/op > ArrayClone.byteClone 100 avgt 15 1055.917 ? 93.574 ns/op > ArrayClone.byteClone 1000 avgt 15 1564.465 ? 140.941 ns/op > ArrayClone.intArraycopy 0 avgt 15 93.732 ? 8.468 ns/op > ArrayClone.intArraycopy 10 avgt 15 214.168 ? 34.526 ns/op > ArrayClone.intArraycopy 100 avgt 15 613.363 ? 45.415 ns/op > ArrayClone.intArraycopy 1000 avgt 15 1759.611 ? 59.010 ns/op > ArrayClone.intClone 0 avgt 15 680.100 ? 24.375 ns/op > ArrayClone.intClone 10 avgt 15 835.979 ? 75.154 ns/op > ArrayClone.intClone 100 avgt 15 1337.354 ? 86.182 ns/op > ArrayClone.intClone 1000 avgt 15 2696.280 ? 207.418 ns/op > Finished running test 'micro:java.lang.ArrayClone' > > > With Patch: > > make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 89.410 ? 5.112 ns/op > ArrayClone.byteArraycopy 10 avgt 15 141.125 ? 8.711 ns/op > ArrayClone.byteArraycopy 100 avgt 15 277.098 ? 12.925 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 770.188 ? 83.034 ns/op > ArrayClone.byteClone 0 avgt 15 94.367 ? 7.088 ns/op > ArrayClone.byteClone 10 avgt 15 151.804 ? 16.497 ns/op > ArrayClone.byteClone 100 avgt 15 296.284 ? 17.893 ns/op > ArrayClone.byteClone 1000 avgt 15 790.517 ? 28.765 ns/op > ArrayClone.intArraycopy 0 avgt 15 93.688 ? 7.050 ns... LGTM. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19448#pullrequestreview-2088108667 From stefank at openjdk.org Thu May 30 12:19:02 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 30 May 2024 12:19:02 GMT Subject: RFR: 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer In-Reply-To: References: Message-ID: On Wed, 29 May 2024 22:04:46 GMT, Vladimir Kozlov wrote: > > > val needs an unsigned type to avoid undefined behavior because of signed integer overflow. I'd use uintptr_t. > > > > > > Makes sense to use something unsigned. Any good place(s) where to put those templates? For now I would just simply put them into relocInfo.hpp (we can used them if we need to reuse them somewhere else) . > > I would suggest `utilities/globalDefinitions.hpp` somewhere near ` pointer_delta*()` I'm not fully convinced that this is good idea. While reading this patch, it is not clear to me that it is correct to hide the warning that ubsan has found. Maybe it is, but I don't see any explanation here showing why it is OK to subtract or add against null here. This is one reason why I'm reluctant to see these functions getting put into globalDefinitions.hpp. I think that there's a risk that people will start to use these functions without making a full analysis to see if there really is a bug that needs to be solved, if there are some code quality improvements that could be done to get rid of the null pointer, or if it is in fact something that we just want to silence the warning for. If you really want to go ahead and add these functions, I would like to see them get more descriptive names that explain why they are used instead of plain ++ and --. For example: `add/sub_to_ptr_maybe_null`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19424#issuecomment-2139427664 From epeter at openjdk.org Thu May 30 12:22:07 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 12:22:07 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v10] In-Reply-To: References: <26UiEE_uEKUU0lg_T91K-b4Or3mtGluJYybbJOpETOU=.a74004d6-590f-49e7-8880-4ab6627926dd@github.com> Message-ID: On Thu, 30 May 2024 12:01:37 GMT, Bhavana Kilambi wrote: >> test/hotspot/jtreg/compiler/c2/irTests/TestVectorFPReduction.java line 67: >> >>> 65: @IR(applyIfCPUFeatureAnd = {"asimd", "true", "sve", "false"}, failOn = {IRNode.ADD_REDUCTION_VD}) >>> 66: @IR(applyIfCPUFeature = {"sve", "true"}, counts = {"requires_strict_order", ">=1", IRNode.ADD_REDUCTION_VD, ">=1"}, >>> 67: failOn = {"no_strict_order"}, phase = CompilePhase.PRINT_IDEAL) >> >> Also: I realize that you only check for `asimd / sve` features. Can you also apply it for avx features? > > I am not sure which specific avx/sse features/versions support these operations. Is it ok to add - > `applyIfCPUFeatureOr = {"sve", "true", "avx", "true"}` ? > or maybe anyone with x86 knowledge who would work on adding strict ordering conditions in their backend (after this patch is merged) can also modify these two testcases accordingly? Also in the vectorapi case, I have included only specific shapes of Vectors where the Reduction nodes would be generated on aarch64. It may or may not be the case for x86 and they might have to add more tests accordingly. I don't fully know what is ok. I usually check this experimentally, and just put the lowest feature that makes my test pass. Do you have access to any x86 machine you could test this on? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1620604302 From mdoerr at openjdk.org Thu May 30 12:41:02 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 30 May 2024 12:41:02 GMT Subject: RFR: 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer [v2] In-Reply-To: References: Message-ID: On Wed, 29 May 2024 10:04:14 GMT, Matthias Baesken wrote: >> When running on macOS with ubsan enabled, we see some issues in relocInfo (hpp and cpp); those already occur in the build quite early. >> >> /jdk/src/hotspot/share/code/relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer >> >> Similar happens when we add to the _current pointer >> _current++; >> this gives : >> relocInfo.hpp:606:13: runtime error: applying non-zero offset to non-null pointer 0xfffffffffffffffe produced null pointer >> >> Seems the pointer subtraction/addition worked so far, so it might be an option to disable ubsan for those 2 functions. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > use template functions We're not hiding a warning. Using pointer addition with `nullptr` or result `nullptr` is undefined behavior. So, the current implementation is not guaranteed to do what we expect. It only works because compilers seem to be merciful. However, casting `nullptr` to `uintptr_t` is guaranteed to be 0. So, switching to unsigned integer arithmetics avoids this problem. The `RelocIterator` code is designed to work with such values. Nevertheless, I like your proposal to call them `add/sub_to_ptr_maybe_null`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19424#issuecomment-2139465731 From bkilambi at openjdk.org Thu May 30 13:08:08 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 30 May 2024 13:08:08 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v10] In-Reply-To: References: <26UiEE_uEKUU0lg_T91K-b4Or3mtGluJYybbJOpETOU=.a74004d6-590f-49e7-8880-4ab6627926dd@github.com> Message-ID: On Thu, 30 May 2024 12:19:20 GMT, Emanuel Peter wrote: >> I am not sure which specific avx/sse features/versions support these operations. Is it ok to add - >> `applyIfCPUFeatureOr = {"sve", "true", "avx", "true"}` ? >> or maybe anyone with x86 knowledge who would work on adding strict ordering conditions in their backend (after this patch is merged) can also modify these two testcases accordingly? Also in the vectorapi case, I have included only specific shapes of Vectors where the Reduction nodes would be generated on aarch64. It may or may not be the case for x86 and they might have to add more tests accordingly. > > I don't fully know what is ok. I usually check this experimentally, and just put the lowest feature that makes my test pass. > > Do you have access to any x86 machine you could test this on? Oh ok. Yes I do have an x86 machine to test on. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1620680288 From mdoerr at openjdk.org Thu May 30 13:17:06 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 30 May 2024 13:17:06 GMT Subject: RFR: 8331935: Add support for primitive array C1 clone intrinsic in PPC [v4] In-Reply-To: <7HG6uTSZR9fs7PrsTNR1N0rzUCIwIgX3-W0VGPcrRyY=.0db16311-1313-4476-84c4-285ebd2a3fbc@github.com> References: <7HG6uTSZR9fs7PrsTNR1N0rzUCIwIgX3-W0VGPcrRyY=.0db16311-1313-4476-84c4-285ebd2a3fbc@github.com> Message-ID: On Tue, 21 May 2024 12:11:13 GMT, Varada M wrote: >> https://bugs.openjdk.org/browse/JDK-8302850 port for PPC64 >> >> JMH Benchmark Results >> >> >> Before : >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 114.107 ? 1.337 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 130.492 ? 0.991 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 139.103 ? 1.913 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 321.688 ? 6.033 ns/op >> ArrayClone.byteClone 0 avgt 15 227.602 ? 3.393 ns/op >> ArrayClone.byteClone 10 avgt 15 237.624 ? 2.996 ns/op >> ArrayClone.byteClone 100 avgt 15 239.219 ? 2.835 ns/op >> >> ArrayClone.byteClone 1000 avgt 15 355.571 ? 2.946 ns/op >> ArrayClone.intArraycopy 0 avgt 15 113.275 ? 1.099 ns/op >> ArrayClone.intArraycopy 10 avgt 15 129.763 ? 1.458 ns/op >> ArrayClone.intArraycopy 100 avgt 15 213.327 ? 2.524 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 449.650 ? 7.338 ns/op >> ArrayClone.intClone 0 avgt 15 225.682 ? 3.048 ns/op >> ArrayClone.intClone 10 avgt 15 234.532 ? 2.817 ns/op >> ArrayClone.intClone 100 avgt 15 295.934 ? 4.925 ns/op >> ArrayClone.intClone 1000 avgt 15 573.368 ? 5.739 ns/op >> Finished running test 'micro:java.lang.ArrayClone' >> Test report is stored in build/aix-ppc64-server-release/test-results/micro_java_lang_ArrayClone >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> micro:java.lang.ArrayClone 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> Finished building target 'test' in configuration 'aix-ppc64-server-release' >> >> >> >> >> After: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 113.894 ? 0.993 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 131.455 ? 0.956 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 139.145 ? 3.002 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 315.957 ? 14.591 ns/op >> ArrayClone.byteClone 0 avgt 15 43.753 ? 3.669 ns/op >> ArrayClone.byteClone 10 avgt 15 52.329 ? 1.041 ns/op >> ArrayClone.byteClone 100 avgt 15 127.711 ? 3.938 ns/op >> >> ArrayClone.byteClone 1000 avgt 15 225.937 ? 1.987 ns/op >> Arr... > > Varada M has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into arryClone > - Add support for primitive array C1 clone intrinsic > - Add support for primitive array C1 clone intrinsic > - Add support for primitive array C1 clone intrinsic > - Add support for primitive array C1 clone intrinsic I've put it again into our nightly tests and haven't seen any errors which may have been caused by this PR. There are currently some unrelated errors. So, I think it's good to go. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19250#issuecomment-2139532656 From sgibbons at openjdk.org Thu May 30 13:20:01 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 30 May 2024 13:20:01 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v48] In-Reply-To: <_0H1QRaXnFyO9eGa7IvO1l4ZzNK_27D59ebYAphp8eg=.0fe38944-0b61-4a1a-b63d-04315b02117f@github.com> References: <_0H1QRaXnFyO9eGa7IvO1l4ZzNK_27D59ebYAphp8eg=.0fe38944-0b61-4a1a-b63d-04315b02117f@github.com> Message-ID: On Thu, 30 May 2024 06:22:17 GMT, Emanuel Peter wrote: >> test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 29: >> >>> 27: * @requires vm.cpu.features ~= ".*avx2.*" >>> 28: * @requires vm.compiler2.enabled >>> 29: * @run main/othervm -XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts -XX:UseAVX=2 -Xbatch -XX:-TieredCompilation -XX:CompileCommand=dontinline,ECoreIndexOf.indexOfKernel ECoreIndexOf >> >> Does this test really need to be `avx2` specific? Does it even need to be C2 specific? >> Or can this run on all platforms? > > Would be a shame to spend so much time on writing a test and then not apply it everywhere ;) I'll add a separate @test block to this file. It was, however, written specifically tuned for the new algorithm to exercise known edge cases. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620676513 From sgibbons at openjdk.org Thu May 30 13:19:57 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 30 May 2024 13:19:57 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v49] In-Reply-To: References: Message-ID: <9PIuILHZnLHrZf1sz0Dsq6iup6qgyXw50mD0nGVS04c=.63bd0afd-d818-46fa-a082-a3d2066829cd@github.com> > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with two additional commits since the last revision: - Stupid EOL at end - Add @test block; fix test indentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/ed06edd6..3e150fe3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=48 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=47-48 Stats: 166 lines in 2 files changed: 7 ins; 0 del; 159 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sgibbons at openjdk.org Thu May 30 13:19:59 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 30 May 2024 13:19:59 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v48] In-Reply-To: References: Message-ID: On Thu, 30 May 2024 06:23:05 GMT, Emanuel Peter wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove duplicate vm.compiler2.enabled > > test/jdk/java/lang/String/IndexOf.java line 35: > >> 33: * @requires vm.cpu.features ~= ".*avx2.*" >> 34: * @requires vm.compiler2.enabled >> 35: * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions -Xcomp -XX:-TieredCompilation -XX:UseAVX=2 -XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts IndexOf > > Same here: why is the test AVX2 specific? Could other platforms not also be "tickled" in interesting ways with this test? There are two test blocks, so all platforms will be able to take advantage of the test via the first block. I'm told that's how this works. > test/jdk/java/lang/StringBuffer/IndexOf.java line 188: > >> 186: } >> 187: >> 188: } > > It looks like you just indented basically the whole file by 1 space. Why? I hadn't noticed this. It's most likely an artifact of my editor as it wasn't intentional. I'll change this back. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620669257 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620679629 From sgibbons at openjdk.org Thu May 30 13:36:18 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 30 May 2024 13:36:18 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v48] In-Reply-To: References: Message-ID: On Thu, 30 May 2024 06:25:32 GMT, Tobias Hartmann wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove duplicate vm.compiler2.enabled > > Control question: Are we confident with this potentially going into JDK 23 or should we rather postpone to JDK 24? The fork is next week. Thank you all for the comments. @TobiHartmann I'm comfortable with this going into JDK 23. The code has been functionally stable for me for the past 2 months. The recent churn centers primarily around restructuring the code for readability and maintainability and ensuring protection against reading past the end of strings. Both Vlad (Volodymyr) and @sviswa7 have scoured the code with me and together we have convinced ourselves that we've covered all the bases. Of course we may have missed something but my confidence is high. The overall performance gain as reported by the StringIndexOf JMH averages ~7x running on an e-core as compared with baseline on the same core. It's skewed somewhat towards massive gains for long (~2K) strings (avg 14.4x) and modest gains for small-ish strings (avg ~1.8x). I've measured up to 60x performance improvement for some 2K UTF-16 indexOf operations. Again, thank you all. It's been a fun exercise and I've learned a lot. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2139569361 From epeter at openjdk.org Thu May 30 13:59:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 13:59:15 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v48] In-Reply-To: References: Message-ID: <_4hKqcW7tE4shxVqG8Et3BjeehNjl0NWvS7PCKZaLe0=.73dc8315-22ee-47c0-8f5b-be74edc2f7a3@github.com> On Thu, 30 May 2024 06:25:32 GMT, Tobias Hartmann wrote: > Control question: Are we confident with this potentially going into JDK 23 or should we rather postpone to JDK 24? The fork is next week. I would hold off. @asgibbons it may pass our tests, and your extensive testing. But you never know what the fuzzer can find over a few weeks once it runs with your changes. I have made that experience many times. Let's just give it a few days, and then we have one JDK version less to worry about for backports on possible follow-up bugs ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2139615822 From epeter at openjdk.org Thu May 30 13:59:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 13:59:15 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v48] In-Reply-To: References: Message-ID: On Thu, 30 May 2024 13:33:40 GMT, Scott Gibbons wrote: >> Control question: Are we confident with this potentially going into JDK 23 or should we rather postpone to JDK 24? The fork is next week. > > Thank you all for the comments. @TobiHartmann I'm comfortable with this going into JDK 23. The code has been functionally stable for me for the past 2 months. The recent churn centers primarily around restructuring the code for readability and maintainability and ensuring protection against reading past the end of strings. Both Vlad (Volodymyr) and @sviswa7 have scoured the code with me and together we have convinced ourselves that we've covered all the bases. Of course we may have missed something but my confidence is high. > > The overall performance gain as reported by the StringIndexOf JMH averages ~7x running on an e-core as compared with baseline on the same core. It's skewed somewhat towards massive gains for long (~2K) strings (avg 14.4x) and modest gains for small-ish strings (avg ~1.8x). I've measured up to 60x performance improvement for some 2K UTF-16 indexOf operations. > > Again, thank you all. It's been a fun exercise and I've learned a lot. @asgibbons generally it would be nice if you waited for me to accept your changes before integrating. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2139604424 From epeter at openjdk.org Thu May 30 13:59:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 13:59:18 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v49] In-Reply-To: <9PIuILHZnLHrZf1sz0Dsq6iup6qgyXw50mD0nGVS04c=.63bd0afd-d818-46fa-a082-a3d2066829cd@github.com> References: <9PIuILHZnLHrZf1sz0Dsq6iup6qgyXw50mD0nGVS04c=.63bd0afd-d818-46fa-a082-a3d2066829cd@github.com> Message-ID: On Thu, 30 May 2024 13:19:57 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with two additional commits since the last revision: > > - Stupid EOL at end > - Add @test block; fix test indentation test/jdk/java/lang/String/IndexOf.java line 25: > 23: > 24: /* > 25: * @test You should add the `@bug 8320448` for all runs. test/jdk/java/lang/String/IndexOf.java line 27: > 25: * @test > 26: * @summary test String indexOf() intrinsic > 27: * @run main/othervm IndexOf Suggestion: * @run main IndexOf You do not need a new VM if you have no arguments ;) test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 25: > 23: > 24: /* @test > 25: * @bug 4162796 4162796 You need to fix the bug numbers. test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 27: > 25: * @bug 4162796 4162796 > 26: * @summary Test indexOf and lastIndexOf > 27: * @run main/othervm -Xbatch -XX:-TieredCompilation -XX:CompileCommand=dontinline,ECoreIndexOf.indexOfKernel ECoreIndexOf I would also add a line without `-XX:-TieredCompilation`, then C1 can be tested with this too test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 32: > 30: > 31: /* @test > 32: * @bug 4162796 4162796 Here too ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620760730 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620756896 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620753321 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620754948 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620753577 From epeter at openjdk.org Thu May 30 13:59:19 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 13:59:19 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v48] In-Reply-To: References: Message-ID: On Thu, 30 May 2024 12:58:27 GMT, Scott Gibbons wrote: >> test/jdk/java/lang/String/IndexOf.java line 35: >> >>> 33: * @requires vm.cpu.features ~= ".*avx2.*" >>> 34: * @requires vm.compiler2.enabled >>> 35: * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions -Xcomp -XX:-TieredCompilation -XX:UseAVX=2 -XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts IndexOf >> >> Same here: why is the test AVX2 specific? Could other platforms not also be "tickled" in interesting ways with this test? > > There are two test blocks, so all platforms will be able to take advantage of the test via the first block. I'm told that's how this works. Yes, that is right. Good. >> test/jdk/java/lang/StringBuffer/IndexOf.java line 188: >> >>> 186: } >>> 187: >>> 188: } >> >> It looks like you just indented basically the whole file by 1 space. Why? > > I hadn't noticed this. It's most likely an artifact of my editor as it wasn't intentional. I'll change this back. Ok, maybe check your code on GitHub next time ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620768228 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620746147 From epeter at openjdk.org Thu May 30 13:59:19 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 13:59:19 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v48] In-Reply-To: References: <_0H1QRaXnFyO9eGa7IvO1l4ZzNK_27D59ebYAphp8eg=.0fe38944-0b61-4a1a-b63d-04315b02117f@github.com> Message-ID: On Thu, 30 May 2024 13:03:06 GMT, Scott Gibbons wrote: >> Would be a shame to spend so much time on writing a test and then not apply it everywhere ;) > > I'll add a separate @test block to this file. It was, however, written specifically tuned for the new algorithm to exercise known edge cases. A new `@test` sounds like a good idea ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620747402 From shade at openjdk.org Thu May 30 14:21:06 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 30 May 2024 14:21:06 GMT Subject: RFR: 8331935: Add support for primitive array C1 clone intrinsic in PPC [v4] In-Reply-To: <7HG6uTSZR9fs7PrsTNR1N0rzUCIwIgX3-W0VGPcrRyY=.0db16311-1313-4476-84c4-285ebd2a3fbc@github.com> References: <7HG6uTSZR9fs7PrsTNR1N0rzUCIwIgX3-W0VGPcrRyY=.0db16311-1313-4476-84c4-285ebd2a3fbc@github.com> Message-ID: <0nsCqUZcDLTlO2tTjATD7mAtOjuBwq_UT-22mJxgtOc=.d38c0683-3416-487a-b72d-db0f2fee39f6@github.com> On Tue, 21 May 2024 12:11:13 GMT, Varada M wrote: >> https://bugs.openjdk.org/browse/JDK-8302850 port for PPC64 >> >> JMH Benchmark Results >> >> >> Before : >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 114.107 ? 1.337 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 130.492 ? 0.991 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 139.103 ? 1.913 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 321.688 ? 6.033 ns/op >> ArrayClone.byteClone 0 avgt 15 227.602 ? 3.393 ns/op >> ArrayClone.byteClone 10 avgt 15 237.624 ? 2.996 ns/op >> ArrayClone.byteClone 100 avgt 15 239.219 ? 2.835 ns/op >> >> ArrayClone.byteClone 1000 avgt 15 355.571 ? 2.946 ns/op >> ArrayClone.intArraycopy 0 avgt 15 113.275 ? 1.099 ns/op >> ArrayClone.intArraycopy 10 avgt 15 129.763 ? 1.458 ns/op >> ArrayClone.intArraycopy 100 avgt 15 213.327 ? 2.524 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 449.650 ? 7.338 ns/op >> ArrayClone.intClone 0 avgt 15 225.682 ? 3.048 ns/op >> ArrayClone.intClone 10 avgt 15 234.532 ? 2.817 ns/op >> ArrayClone.intClone 100 avgt 15 295.934 ? 4.925 ns/op >> ArrayClone.intClone 1000 avgt 15 573.368 ? 5.739 ns/op >> Finished running test 'micro:java.lang.ArrayClone' >> Test report is stored in build/aix-ppc64-server-release/test-results/micro_java_lang_ArrayClone >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> micro:java.lang.ArrayClone 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> Finished building target 'test' in configuration 'aix-ppc64-server-release' >> >> >> >> >> After: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 113.894 ? 0.993 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 131.455 ? 0.956 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 139.145 ? 3.002 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 315.957 ? 14.591 ns/op >> ArrayClone.byteClone 0 avgt 15 43.753 ? 3.669 ns/op >> ArrayClone.byteClone 10 avgt 15 52.329 ? 1.041 ns/op >> ArrayClone.byteClone 100 avgt 15 127.711 ? 3.938 ns/op >> >> ArrayClone.byteClone 1000 avgt 15 225.937 ? 1.987 ns/op >> Arr... > > Varada M has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into arryClone > - Add support for primitive array C1 clone intrinsic > - Add support for primitive array C1 clone intrinsic > - Add support for primitive array C1 clone intrinsic > - Add support for primitive array C1 clone intrinsic There is currently a regression in the original code, [JDK-8332670](https://bugs.openjdk.org/browse/JDK-8332670), which may explain some instability on PPC. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19250#issuecomment-2139660775 From aboldtch at openjdk.org Thu May 30 14:34:03 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 30 May 2024 14:34:03 GMT Subject: RFR: 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer [v2] In-Reply-To: References: Message-ID: <4LTCBpKcJDyCZGIKgDYczbkrxHJYY85qyBuA21e8B9E=.e43e018a-bfa2-4554-bf4f-80f6c2afc878@github.com> On Wed, 29 May 2024 10:04:14 GMT, Matthias Baesken wrote: >> When running on macOS with ubsan enabled, we see some issues in relocInfo (hpp and cpp); those already occur in the build quite early. >> >> /jdk/src/hotspot/share/code/relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer >> >> Similar happens when we add to the _current pointer >> _current++; >> this gives : >> relocInfo.hpp:606:13: runtime error: applying non-zero offset to non-null pointer 0xfffffffffffffffe produced null pointer >> >> Seems the pointer subtraction/addition worked so far, so it might be an option to disable ubsan for those 2 functions. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > use template functions This seems to be the same as [JDK-8300821](https://bugs.openjdk.org/browse/JDK-8300821). (Changeset 01312a002ba27bfbfebb9fde484ca34ebde0704c) The miss here seems to be that `has_loc` does not mean "This CodeSection has relocatations". But means "This CodeSection has allocated a relocations buffer". I believe the correct check would be `cs->locs_count() == 0` --- a/src/hotspot/share/asm/codeBuffer.cpp +++ b/src/hotspot/share/asm/codeBuffer.cpp @@ -525,7 +525,7 @@ void CodeBuffer::finalize_oop_references(const methodHandle& mh) { for (int n = (int) SECT_FIRST; n < (int) SECT_LIMIT; n++) { // pull code out of each section CodeSection* cs = code_section(n); - if (cs->is_empty() || !cs->has_locs()) continue; // skip trivial section + if (cs->is_empty() || cs->locs_count() == 0) continue; // skip trivial section RelocIterator iter(cs); while (iter.next()) { if (iter.type() == relocInfo::metadata_type) { @@ -793,7 +793,7 @@ void CodeBuffer::relocate_code_to(CodeBuffer* dest) const { for (int n = (int) SECT_FIRST; n < (int)SECT_LIMIT; n++) { // pull code out of each section const CodeSection* cs = code_section(n); - if (cs->is_empty() || !cs->has_locs()) continue; // skip trivial section + if (cs->is_empty() || cs->locs_count() == 0) continue; // skip trivial section CodeSection* dest_cs = dest->code_section(n); { // Repair the pc relative information in the code after the move RelocIterator iter(dest_cs); @@ -1057,7 +1057,7 @@ void CodeSection::print(const char* name) { name, p2i(start()), p2i(end()), p2i(limit()), size(), capacity()); tty->print_cr(" %7s.locs = " PTR_FORMAT " : " PTR_FORMAT " : " PTR_FORMAT " (%d of %d) point=%d", name, p2i(locs_start()), p2i(locs_end()), p2i(locs_limit()), locs_size, locs_capacity(), locs_point_off()); - if (PrintRelocations) { + if (PrintRelocations && locs_size != 0) { RelocIterator iter(this); iter.print(); } There is also the following which perhaps should assert that there is a relocation at the call pc. As it makes some assumptions about being able to patch its type. (I am not familiar with this code.) https://github.com/openjdk/jdk/blob/4acafb809c66589fbbfee9c9a4ba7820f848f0e4/src/hotspot/cpu/x86/c1_CodeStubs_x86.cpp#L434 As for the `RelocIterator::RelocIterator(nmethod* nm, address begin, address limit)` iterator it is less clear to me if it should be guarded agains nullptr from outside. So something would still have to be done about the ubsan. Much of this seems to be about optimising the `RelocIterator`. But it does not seem worth either the cpu cycles to execute the constructor (nor the cognitive brain cycles to reason about why the constructor is valid when current == -sizeof(relocInfo*)), when we can easily check from the callsite that there are no relocations in our nmethod or code section. Unsure if solving this by type casting is just hiding underlying design issues. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19424#issuecomment-2139709521 From sgibbons at openjdk.org Thu May 30 15:00:19 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 30 May 2024 15:00:19 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v48] In-Reply-To: <_4hKqcW7tE4shxVqG8Et3BjeehNjl0NWvS7PCKZaLe0=.73dc8315-22ee-47c0-8f5b-be74edc2f7a3@github.com> References: <_4hKqcW7tE4shxVqG8Et3BjeehNjl0NWvS7PCKZaLe0=.73dc8315-22ee-47c0-8f5b-be74edc2f7a3@github.com> Message-ID: On Thu, 30 May 2024 13:56:30 GMT, Emanuel Peter wrote: >> Control question: Are we confident with this potentially going into JDK 23 or should we rather postpone to JDK 24? The fork is next week. > >> Control question: Are we confident with this potentially going into JDK 23 or should we rather postpone to JDK 24? The fork is next week. > > I would hold off. @asgibbons it may pass our tests, and your extensive testing. But you never know what the fuzzer can find over a few weeks once it runs with your changes. I have made that experience many times. Let's just give it a few days, and then we have one JDK version less to worry about for backports on possible follow-up bugs ;) @eme64 I'm glad to have received your feedback. I see I have erroneously assumed that by making the exact code change you requested still requires your acceptance - I won't make that mistake again. I had also erroneously assumed that your review was complete and you had no further changes for me to make. I'd also not like to make that mistake again, but I'm unsure how to conclude that a review is complete - it seems like 7 hours of elapsed time isn't sufficient to indicate completion, so can you please help me figure this out? Perhaps it's just my distaste for "trickle-in" comments, which I should get over, or is there another way you can suggest? As for the fuzzer I would be very interested in learning more about this. We have a significant number of compute resources, so it may be valuable for us to set up a copy of the fuzzer on-site to improve the quality of our submissions. Can you help in pointing me to someone that can advise me on how to do this? As for holding off the integration, I'll leave the decision to a sponsor for this PR. I don't believe increasing the reviewer count just to "force" reevaluation should be an acceptable practice, although I'm not an insider in this community. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2139814010 From mdoerr at openjdk.org Thu May 30 15:12:04 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 30 May 2024 15:12:04 GMT Subject: RFR: 8331935: Add support for primitive array C1 clone intrinsic in PPC [v4] In-Reply-To: <7HG6uTSZR9fs7PrsTNR1N0rzUCIwIgX3-W0VGPcrRyY=.0db16311-1313-4476-84c4-285ebd2a3fbc@github.com> References: <7HG6uTSZR9fs7PrsTNR1N0rzUCIwIgX3-W0VGPcrRyY=.0db16311-1313-4476-84c4-285ebd2a3fbc@github.com> Message-ID: On Tue, 21 May 2024 12:11:13 GMT, Varada M wrote: >> https://bugs.openjdk.org/browse/JDK-8302850 port for PPC64 >> >> JMH Benchmark Results >> >> >> Before : >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 114.107 ? 1.337 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 130.492 ? 0.991 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 139.103 ? 1.913 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 321.688 ? 6.033 ns/op >> ArrayClone.byteClone 0 avgt 15 227.602 ? 3.393 ns/op >> ArrayClone.byteClone 10 avgt 15 237.624 ? 2.996 ns/op >> ArrayClone.byteClone 100 avgt 15 239.219 ? 2.835 ns/op >> >> ArrayClone.byteClone 1000 avgt 15 355.571 ? 2.946 ns/op >> ArrayClone.intArraycopy 0 avgt 15 113.275 ? 1.099 ns/op >> ArrayClone.intArraycopy 10 avgt 15 129.763 ? 1.458 ns/op >> ArrayClone.intArraycopy 100 avgt 15 213.327 ? 2.524 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 449.650 ? 7.338 ns/op >> ArrayClone.intClone 0 avgt 15 225.682 ? 3.048 ns/op >> ArrayClone.intClone 10 avgt 15 234.532 ? 2.817 ns/op >> ArrayClone.intClone 100 avgt 15 295.934 ? 4.925 ns/op >> ArrayClone.intClone 1000 avgt 15 573.368 ? 5.739 ns/op >> Finished running test 'micro:java.lang.ArrayClone' >> Test report is stored in build/aix-ppc64-server-release/test-results/micro_java_lang_ArrayClone >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> micro:java.lang.ArrayClone 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> Finished building target 'test' in configuration 'aix-ppc64-server-release' >> >> >> >> >> After: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 113.894 ? 0.993 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 131.455 ? 0.956 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 139.145 ? 3.002 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 315.957 ? 14.591 ns/op >> ArrayClone.byteClone 0 avgt 15 43.753 ? 3.669 ns/op >> ArrayClone.byteClone 10 avgt 15 52.329 ? 1.041 ns/op >> ArrayClone.byteClone 100 avgt 15 127.711 ? 3.938 ns/op >> >> ArrayClone.byteClone 1000 avgt 15 225.937 ? 1.987 ns/op >> Arr... > > Varada M has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into arryClone > - Add support for primitive array C1 clone intrinsic > - Add support for primitive array C1 clone intrinsic > - Add support for primitive array C1 clone intrinsic > - Add support for primitive array C1 clone intrinsic Thanks for the hint! We should wait for that one to be fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19250#issuecomment-2139863019 From duke at openjdk.org Thu May 30 15:19:17 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Thu, 30 May 2024 15:19:17 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v48] In-Reply-To: <_4hKqcW7tE4shxVqG8Et3BjeehNjl0NWvS7PCKZaLe0=.73dc8315-22ee-47c0-8f5b-be74edc2f7a3@github.com> References: <_4hKqcW7tE4shxVqG8Et3BjeehNjl0NWvS7PCKZaLe0=.73dc8315-22ee-47c0-8f5b-be74edc2f7a3@github.com> Message-ID: <3r6BovGjkFUudXIeF6FF3ODENJ5F_wdHG1z4eyjpI-Y=.61eb125c-932d-4713-93fe-9f9ccb6584e4@github.com> On Thu, 30 May 2024 13:56:30 GMT, Emanuel Peter wrote: >> Control question: Are we confident with this potentially going into JDK 23 or should we rather postpone to JDK 24? The fork is next week. > >> Control question: Are we confident with this potentially going into JDK 23 or should we rather postpone to JDK 24? The fork is next week. > > I would hold off. @asgibbons it may pass our tests, and your extensive testing. But you never know what the fuzzer can find over a few weeks once it runs with your changes. I have made that experience many times. Let's just give it a few days, and then we have one JDK version less to worry about for backports on possible follow-up bugs ;) @eme64 I guess to add some confidence.. we did also 'test it independently' to catch blind spots. i.e. `String/IndexOf.java` is from me. I tried to be as paranoid as possible with non-random strings. Passed everything I could throw at it.. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2139882544 From epeter at openjdk.org Thu May 30 15:19:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 15:19:18 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v48] In-Reply-To: References: <_4hKqcW7tE4shxVqG8Et3BjeehNjl0NWvS7PCKZaLe0=.73dc8315-22ee-47c0-8f5b-be74edc2f7a3@github.com> Message-ID: <2MrjPeUReR3CJbw_L3K92H8O7xrKSIdZVzfpf7LVkIM=.dab21bd9-b149-4917-92dd-3e6abcca482b@github.com> On Thu, 30 May 2024 14:57:35 GMT, Scott Gibbons wrote: >>> Control question: Are we confident with this potentially going into JDK 23 or should we rather postpone to JDK 24? The fork is next week. >> >> I would hold off. @asgibbons it may pass our tests, and your extensive testing. But you never know what the fuzzer can find over a few weeks once it runs with your changes. I have made that experience many times. Let's just give it a few days, and then we have one JDK version less to worry about for backports on possible follow-up bugs ;) > > @eme64 I'm glad to have received your feedback. I see I have erroneously assumed that by making the exact code change you requested still requires your acceptance - I won't make that mistake again. I had also erroneously assumed that your review was complete and you had no further changes for me to make. I'd also not like to make that mistake again, but I'm unsure how to conclude that a review is complete - it seems like 7 hours of elapsed time isn't sufficient to indicate completion, so can you please help me figure this out? Perhaps it's just my distaste for "trickle-in" comments, which I should get over, or is there another way you can suggest? > > As for the fuzzer I would be very interested in learning more about this. We have a significant number of compute resources, so it may be valuable for us to set up a copy of the fuzzer on-site to improve the quality of our submissions. Can you help in pointing me to someone that can advise me on how to do this? > > As for holding off the integration, I'll leave the decision to a sponsor for this PR. I don't believe increasing the reviewer count just to "force" reevaluation should be an acceptable practice, although I'm not an insider in this community. @asgibbons I was done with my review, or at least so I thought ? Still: if I give comments, it would be nice to quickly finish the conversation, unless if I don't respond in many days and not even to emails. Often I only see the glaring issues. Then you fix them, and then I see something else around it. Then I may give more comments. That is what happened. If I think that I have small suggestions and then I'm done, then I might even approve even though there are suggestions still to be added. I just put up the limit really quick so that nobody else would by accident sponsor it before we have finished the conversation, and I will definitely give you my approval once the little issues are resolved ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2139893561 From epeter at openjdk.org Thu May 30 15:19:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 15:19:18 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v49] In-Reply-To: <9PIuILHZnLHrZf1sz0Dsq6iup6qgyXw50mD0nGVS04c=.63bd0afd-d818-46fa-a082-a3d2066829cd@github.com> References: <9PIuILHZnLHrZf1sz0Dsq6iup6qgyXw50mD0nGVS04c=.63bd0afd-d818-46fa-a082-a3d2066829cd@github.com> Message-ID: <4ZM8wZFYPZjIbjb_O6n6DNAlpYOa2EHfmhSZHVUAXNA=.b923e319-f143-4a4c-9916-face36f337db@github.com> On Thu, 30 May 2024 13:19:57 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with two additional commits since the last revision: > > - Stupid EOL at end > - Add @test block; fix test indentation About the fuzzer: we have it in our closed tests. But I think it comes from this: https://github.com/shipilev/JavaFuzzer ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2139901477 From roland at openjdk.org Thu May 30 15:23:07 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 30 May 2024 15:23:07 GMT Subject: RFR: 8275202: C2: optimize out more redundant conditions In-Reply-To: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> References: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> Message-ID: On Wed, 21 Jun 2023 12:47:26 GMT, Roland Westrelin wrote: > This change adds a new loop opts pass to optimize redundant conditions > such as the second one in: > > > if (i < 10) { > if (i < 42) { > > > In the branch of the first if, the type of i can be narrowed down to > [min_jint, 9] which can then be used to constant fold the second > condition. > > The compiler already keeps track of type[n] for every node in the > current compilation unit. That's not sufficient to optimize the > snippet above though because the type of i can only be narrowed in > some sections of the control flow (that is a subset of all > controls). The solution is to build a new table that tracks the type > of n at every control c > > > type'[n, root] = type[n] // initialized from igvn's type table > type'[n, c] = type[n, idom(c)] > > > This pass iterates over the CFG looking for conditions such as: > > > if (i < 10) { > > > that allows narrowing the type of i and updates the type' table > accordingly. > > At a region r: > > > type'[n, r] = meet(type'[n, r->in(1)], type'[n, r->in(2)]...) > > > For a Phi phi at a region r: > > > type'[phi, r] = meet(type'[phi->in(1), r->in(1)], type'[phi->in(2), r->in(2)]...) > > > Once a type is narrowed, uses are enqueued and their types are > computed by calling the Value() methods. If a use's type is narrowed, > it's recorded at c in the type' table. Value() methods retrieve types > from the type table, not the type' table. To address that issue while > leaving Value() methods unchanged, before calling Value() at c, the > type table is updated so: > > > type[n] = type'[n, c] > > > An exception is for Phi::Value which needs to retrieve the type of > nodes are various controls: there, a new type(Node* n, Node* c) > method is used. > > For most n and c, type'[n, c] is likely the same as type[n], the type > recorded in the global igvn table (that is there shouldn't be many > nodes at only a few control for which we can narrow the type down). As > a consequence, the types'[n, c] table is implemented with: > > - At c, narrowed down types are stored in a GrowableArray. Each entry > records the previous type at idom(c) and the narrowed down type at > c. > > - The GrowableArray of type updates is recorded in a hash table > indexed by c. If there's no update at c, there's no entry in the > hash table. > > This pass operates in 2 steps: > > - it first iterates over the graph looking for conditions that narrow > the types of some nodes and propagate type updates to uses until a > fix point. > > - it transforms the graph so newly found constant nodes are folded. > > > The new pass is run on every loop opts. There are a couple rea... Another comment to keep alive ------------- PR Comment: https://git.openjdk.org/jdk/pull/14586#issuecomment-2139917166 From sgibbons at openjdk.org Thu May 30 15:27:18 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 30 May 2024 15:27:18 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v49] In-Reply-To: References: <9PIuILHZnLHrZf1sz0Dsq6iup6qgyXw50mD0nGVS04c=.63bd0afd-d818-46fa-a082-a3d2066829cd@github.com> Message-ID: On Thu, 30 May 2024 13:50:01 GMT, Emanuel Peter wrote: >> Scott Gibbons has updated the pull request incrementally with two additional commits since the last revision: >> >> - Stupid EOL at end >> - Add @test block; fix test indentation > > test/jdk/java/lang/String/IndexOf.java line 25: > >> 23: >> 24: /* >> 25: * @test > > You should add the `@bug 8320448` for all runs. Done. > test/jdk/java/lang/String/IndexOf.java line 27: > >> 25: * @test >> 26: * @summary test String indexOf() intrinsic >> 27: * @run main/othervm IndexOf > > Suggestion: > > * @run main IndexOf > > You do not need a new VM if you have no arguments ;) Done. > test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 25: > >> 23: >> 24: /* @test >> 25: * @bug 4162796 4162796 > > You need to fix the bug numbers. Done. > test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 27: > >> 25: * @bug 4162796 4162796 >> 26: * @summary Test indexOf and lastIndexOf >> 27: * @run main/othervm -Xbatch -XX:-TieredCompilation -XX:CompileCommand=dontinline,ECoreIndexOf.indexOfKernel ECoreIndexOf > > I would also add a line without `-XX:-TieredCompilation`, then C1 can be tested with this too Done. > test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 32: > >> 30: >> 31: /* @test >> 32: * @bug 4162796 4162796 > > Here too Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620951690 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620949315 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620945040 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620947641 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620945484 From sgibbons at openjdk.org Thu May 30 15:30:45 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 30 May 2024 15:30:45 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v50] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/3e150fe3..57e115d7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=49 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=48-49 Stats: 6 lines in 2 files changed: 3 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From epeter at openjdk.org Thu May 30 15:37:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 15:37:18 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v49] In-Reply-To: References: <9PIuILHZnLHrZf1sz0Dsq6iup6qgyXw50mD0nGVS04c=.63bd0afd-d818-46fa-a082-a3d2066829cd@github.com> Message-ID: On Thu, 30 May 2024 15:21:10 GMT, Scott Gibbons wrote: > Done. I still see the numbers `4162796 4162796`. I'm not sure if this bug number is relevant. But certainly it should only be mentioned once ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620960158 From epeter at openjdk.org Thu May 30 15:37:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 15:37:18 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v49] In-Reply-To: References: <9PIuILHZnLHrZf1sz0Dsq6iup6qgyXw50mD0nGVS04c=.63bd0afd-d818-46fa-a082-a3d2066829cd@github.com> Message-ID: On Thu, 30 May 2024 15:30:26 GMT, Emanuel Peter wrote: >> Done. > >> Done. > > I still see the numbers `4162796 4162796`. I'm not sure if this bug number is relevant. But certainly it should only be mentioned once ;) I never add old bug number to new tests... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620963284 From epeter at openjdk.org Thu May 30 15:37:20 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 15:37:20 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v50] In-Reply-To: References: Message-ID: On Thu, 30 May 2024 15:30:45 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Review comments test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 25: > 23: > 24: /* @test > 25: * @bug 4162796 4162796 8320448 Suggestion: * @bug 8320448 test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 33: > 31: > 32: /* @test > 33: * @bug 4162796 4162796 8320448 Suggestion: * @bug 8320448 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620964138 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620964720 From epeter at openjdk.org Thu May 30 15:37:20 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 15:37:20 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v50] In-Reply-To: References: Message-ID: On Thu, 30 May 2024 15:33:16 GMT, Emanuel Peter wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments > > test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 25: > >> 23: >> 24: /* @test >> 25: * @bug 4162796 4162796 8320448 > > Suggestion: > > * @bug 8320448 As I said above: I never add old bug numbers to new tests. But here it is even duplicated ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620966568 From stefank at openjdk.org Thu May 30 15:46:16 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 30 May 2024 15:46:16 GMT Subject: RFR: 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer [v2] In-Reply-To: References: Message-ID: On Thu, 30 May 2024 12:38:44 GMT, Martin Doerr wrote: > We're not hiding a warning. Using pointer addition with `nullptr` or result `nullptr` is undefined behavior. So, the current implementation is not guaranteed to do what we expect. It only works because compilers seem to be merciful. I think you are reading too much into the words I used. Ubsan is warning/complaining/error that the code is problematic. > However, casting `nullptr` to `uintptr_t` is guaranteed to be 0. So, switching to unsigned integer arithmetics avoids this problem. I fully understand the suggested patch. > The `RelocIterator` code is designed to work with such values. This is the crux of my complaint. Is this a good design given that the usage of null isn't apparent on first read? Or was this only something that grew into existence after a while? And would it make more sense for maintainability to not be doing this? I think it is important to think about those questions when trying to handle all these ubsan failures. FWIW, I (and Axel) took a step back and asked why do we think it is necessary to support null pointers here and I think you can see the musing around in Axel's response above. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19424#issuecomment-2139999240 From sgibbons at openjdk.org Thu May 30 15:48:50 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 30 May 2024 15:48:50 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v51] In-Reply-To: References: Message-ID: <73yhW7umbpUKGvfaJ5hkzLjIQ6_8hakVYD59s0-60OY=.321f0126-06a2-4efc-a271-80a518c53baa@github.com> > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Fix bug number in tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/57e115d7..6eae46e5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=50 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=49-50 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sgibbons at openjdk.org Thu May 30 15:48:50 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 30 May 2024 15:48:50 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v50] In-Reply-To: References: Message-ID: <22JtxwmXnPAAUHF8c3g6lmvUtymzGr6Ekib_nUAKbW4=.3315da8b-09bc-4534-9f27-0fe1485456c7@github.com> On Thu, 30 May 2024 15:34:17 GMT, Emanuel Peter wrote: >> test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 25: >> >>> 23: >>> 24: /* @test >>> 25: * @bug 4162796 4162796 8320448 >> >> Suggestion: >> >> * @bug 8320448 > > As I said above: I never add old bug numbers to new tests. But here it is even duplicated ;) The file I used as baseline for this `test/jdk/java/lang/StringBuffer/IndexOf.java` has the bug number listed twice (copy/paste). I'll remove it from here, but leave it in the original unless requested to change it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620985844 From sgibbons at openjdk.org Thu May 30 15:48:50 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 30 May 2024 15:48:50 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v50] In-Reply-To: References: Message-ID: <3nJczHjyjWVNAlPneM19NW6Dc0MRql6sDE2hX4tyZpc=.3539eed5-c871-422c-806b-1f2d5bcbae2f@github.com> On Thu, 30 May 2024 15:33:27 GMT, Emanuel Peter wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments > > test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 33: > >> 31: >> 32: /* @test >> 33: * @bug 4162796 4162796 8320448 > > Suggestion: > > * @bug 8320448 Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620988308 From epeter at openjdk.org Thu May 30 16:10:17 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 16:10:17 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v51] In-Reply-To: <73yhW7umbpUKGvfaJ5hkzLjIQ6_8hakVYD59s0-60OY=.321f0126-06a2-4efc-a271-80a518c53baa@github.com> References: <73yhW7umbpUKGvfaJ5hkzLjIQ6_8hakVYD59s0-60OY=.321f0126-06a2-4efc-a271-80a518c53baa@github.com> Message-ID: On Thu, 30 May 2024 15:48:50 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Fix bug number in tests Ok, now it is good for me. But I would definately wait with integration for after the fork next week. src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 2: > 1: /* > 2: * Copyright (c) 2023, 2024 Intel Corporation. All rights reserved. Is the 2023 year intentional? I don't know your policy, so you can just ignore this ;) src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 334: > 332: // NUMBER_OF_CASES (currently 10) needle sizes for both big and small. There are special > 333: // routines for handling needle sizes > NUMBER_OF_CASES (L_{big,small}CaseDefault). These > 334: // cases use C@'s arrays_equals() to compare the needle to the haystack. The small cases Suggestion: // cases use C2's arrays_equals() to compare the needle to the haystack. The small cases Randomly spotted this. src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 773: > 771: // jae done > 772: // > 773: // Final index of start of needle @((16 - (ndlLen %16)) & 0xf) << 1 What is the meaning of the `@`? Maybe `at`. I'd use the same consistently ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16753#pullrequestreview-2088739965 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1621015782 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1621017548 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1621019611 From sgibbons at openjdk.org Thu May 30 16:16:45 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 30 May 2024 16:16:45 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v52] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Fix copyright & a couple of comment typos ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/6eae46e5..f432320f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=51 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=50-51 Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From kvn at openjdk.org Thu May 30 16:16:45 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 30 May 2024 16:16:45 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v49] In-Reply-To: <4ZM8wZFYPZjIbjb_O6n6DNAlpYOa2EHfmhSZHVUAXNA=.b923e319-f143-4a4c-9916-face36f337db@github.com> References: <9PIuILHZnLHrZf1sz0Dsq6iup6qgyXw50mD0nGVS04c=.63bd0afd-d818-46fa-a082-a3d2066829cd@github.com> <4ZM8wZFYPZjIbjb_O6n6DNAlpYOa2EHfmhSZHVUAXNA=.b923e319-f143-4a4c-9916-face36f337db@github.com> Message-ID: On Thu, 30 May 2024 15:16:34 GMT, Emanuel Peter wrote: >> Scott Gibbons has updated the pull request incrementally with two additional commits since the last revision: >> >> - Stupid EOL at end >> - Add @test block; fix test indentation > > About the fuzzer: we have it in our closed tests. But I think it comes from this: https://github.com/shipilev/JavaFuzzer I agree with @eme64 to postpone the integration after JDK 23 is forked in one week. It is not about how you confident with code. It is size of code. I did only limited (tier1-4) testing in our infra which did not cover all our testing configuration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2140103757 From sgibbons at openjdk.org Thu May 30 16:16:45 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 30 May 2024 16:16:45 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v51] In-Reply-To: References: <73yhW7umbpUKGvfaJ5hkzLjIQ6_8hakVYD59s0-60OY=.321f0126-06a2-4efc-a271-80a518c53baa@github.com> Message-ID: <1veKa8k9a_OgFxuy0XD_MPxOHgGpy8LXTgG6gEPfXiU=.3ed8e416-4267-40c5-8daf-8a9517f51557@github.com> On Thu, 30 May 2024 16:03:29 GMT, Emanuel Peter wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bug number in tests > > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 2: > >> 1: /* >> 2: * Copyright (c) 2023, 2024 Intel Corporation. All rights reserved. > > Is the 2023 year intentional? I don't know your policy, so you can just ignore this ;) I started this in November :-) > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 334: > >> 332: // NUMBER_OF_CASES (currently 10) needle sizes for both big and small. There are special >> 333: // routines for handling needle sizes > NUMBER_OF_CASES (L_{big,small}CaseDefault). These >> 334: // cases use C@'s arrays_equals() to compare the needle to the haystack. The small cases > > Suggestion: > > // cases use C2's arrays_equals() to compare the needle to the haystack. The small cases > > Randomly spotted this. Fixed. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 773: > >> 771: // jae done >> 772: // >> 773: // Final index of start of needle @((16 - (ndlLen %16)) & 0xf) << 1 > > What is the meaning of the `@`? Maybe `at`. I'd use the same consistently Changed to "at". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1621034441 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1621034583 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1621034821 From stefank at openjdk.org Thu May 30 16:17:03 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 30 May 2024 16:17:03 GMT Subject: RFR: 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer [v2] In-Reply-To: References: Message-ID: On Wed, 29 May 2024 10:04:14 GMT, Matthias Baesken wrote: >> When running on macOS with ubsan enabled, we see some issues in relocInfo (hpp and cpp); those already occur in the build quite early. >> >> /jdk/src/hotspot/share/code/relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer >> >> Similar happens when we add to the _current pointer >> _current++; >> this gives : >> relocInfo.hpp:606:13: runtime error: applying non-zero offset to non-null pointer 0xfffffffffffffffe produced null pointer >> >> Seems the pointer subtraction/addition worked so far, so it might be an option to disable ubsan for those 2 functions. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > use template functions Oh, and as it doesn't seem to have been clear from my earlier comments: I don't strongly oppose that you fix it this way you do in the RelocIterator, since I have very little interaction with that code. The comment was more that I would prefer if we take a case-by-case approach when we look at other parts of HotSpot with similar problems and really think what the correct solution would be, and that we don't too quickly start to grab for the `add/sub_to_ptr` solution. Putting these functions in globalDefinitions makes it all too easy to just grab for these functions when we try to solve similar problems, IMHO. That's my 2c. I'm not blocking this patch, as long as we get somewhat decent names. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19424#issuecomment-2140118108 From sgibbons at openjdk.org Thu May 30 16:23:34 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 30 May 2024 16:23:34 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v49] In-Reply-To: References: <9PIuILHZnLHrZf1sz0Dsq6iup6qgyXw50mD0nGVS04c=.63bd0afd-d818-46fa-a082-a3d2066829cd@github.com> <4ZM8wZFYPZjIbjb_O6n6DNAlpYOa2EHfmhSZHVUAXNA=.b923e319-f143-4a4c-9916-face36f337db@github.com> Message-ID: On Thu, 30 May 2024 16:10:53 GMT, Vladimir Kozlov wrote: >> About the fuzzer: we have it in our closed tests. But I think it comes from this: https://github.com/shipilev/JavaFuzzer > > I agree with @eme64 to postpone the integration after JDK 23 is forked in one week. It is not about how you confident with code. It is size of code. I did only limited (tier1-4) testing in our infra which did not cover all our testing configuration. @vnkozlov OK. I'll defer to you all. I've contacted the author of the fuzzer to see what I can do to set up a local instance. Would this be sufficient to increase confidence for future submissions? We can run it perpetually on fixes (provided I can set it up). Had I done that, we could have had 6 months of fuzzing on top of our tests. Would that have alleviated this concern? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2140124882 From epeter at openjdk.org Thu May 30 16:23:34 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 16:23:34 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v49] In-Reply-To: References: <9PIuILHZnLHrZf1sz0Dsq6iup6qgyXw50mD0nGVS04c=.63bd0afd-d818-46fa-a082-a3d2066829cd@github.com> <4ZM8wZFYPZjIbjb_O6n6DNAlpYOa2EHfmhSZHVUAXNA=.b923e319-f143-4a4c-9916-face36f337db@github.com> Message-ID: <9Gep5o1EEF96gprsHB1vDiw8KSQON-c6uh_9gBJyq9c=.43962158-2f23-4929-9e72-d4827a4fa5e8@github.com> On Thu, 30 May 2024 16:16:59 GMT, Scott Gibbons wrote: >> I agree with @eme64 to postpone the integration after JDK 23 is forked in one week. It is not about how you confident with code. It is size of code. I did only limited (tier1-4) testing in our infra which did not cover all our testing configuration. > > @vnkozlov OK. I'll defer to you all. I've contacted the author of the fuzzer to see what I can do to set up a local instance. Would this be sufficient to increase confidence for future submissions? We can run it perpetually on fixes (provided I can set it up). Had I done that, we could have had 6 months of fuzzing on top of our tests. Would that have alleviated this concern? @asgibbons I generally just stop pushing ANY RFE's a week or two before the fork. Even if you did run the fuzzer with it - there are often last-minute changes. And your code here is rather large, so even if you are confident, there must be at least one bug hiding. Running the fuzzer is nice as pre-integration, but it mostly only catches things post-integration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2140136262 From aph at openjdk.org Thu May 30 16:41:15 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 30 May 2024 16:41:15 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v18] In-Reply-To: References: Message-ID: <-EY3zP64NRZotMOx7aquatQhDG7eMfRJoUx1AF94_Iw=.d2d45b7e-b1b9-480b-8136-740f87c6b610@github.com> On Thu, 23 May 2024 23:18:49 GMT, Dean Long wrote: > What's a good benchmark to run to show the benefit of this change, or to show the effect of different cache sizes and/or Java implementation changes? > > I tried running micro:ScopedValue benchmarks with -Djava.lang.ScopedValue.cacheSize=2 and didn't see a difference. But the new compiler/scoped_value/TestScopedValue.java test fails in compiler.c2.irTests.TestScopedValue.testFastPath16 with the cache size set to 2. With `-jvmArgsAppend -Djava.lang.ScopedValue.cacheSize=4` I get Benchmark Mode Cnt Score Error Units `ScopedValues.sixValues_ScopedValue avgt 10 11.881 ? 0.017 us/op` before this patch and `ScopedValues.sixValues_ScopedValue avgt 10 0.006 ? 0.001 us/op` after. > Given the right benchmark, there are some experiments I'd like to try, related to the ScopedValue Java implemenation: > > 1. use only a primary slot probe, no secondary > > 2. use a deterministic secondary probe (based on the hash), not random Looks pretty deterministic to me. Every value has two hash codes, primary and secondary,and they are different. > 3. fix put() so it will reuse an existing slot. Currently it blindly set both `victim` and `other` slots. It seems like it should check the `other` slot first and reuse it if already set. Put another conditional load in the control flow? I'm not sure that would do much, but OK. I guess I don't know how this would work. > 4. separate cache bitmap from slow path bitmaps, which could be 64-bits with only 1 bit per SV, not 2. I guess that might help. > 5. Use a per-SV MethodHandle getter using MethodHandles.guardWithTest() to avoid profile pollution Interesting. I did a version of the code that used bytecode generation to produce a new accessor method for each scoped value a year or two ago, for that same reason. It did work, but was rather heavyweight. Re benchmarks: the benchmarks are all there, but the current design is based on principles, as well as benchmark results. 0. As much as possible, and this is hard to do with just random hashing, I (and I believe, we) want the performance to be linear and predictable, rather than mostly luck. That's what this patch really brings to the party! 1. We have a hard guarantee, not just a probabilistic one, that if you are repeatedly using two different scoped values, we never have to do a fallback linear search. This is hard to capture with a benchmark, but I guess setting `java.lang.ScopedValue.cacheSize=2` would do it. 2. The bind method may do a little bit of extra work to help the `get()`, but not too much: I can see some cases where binding is done fairly frequently, and it should not be too heavyweight. But I don't know what _too heavyweight_ really means, so... ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2140201010 From jbhateja at openjdk.org Thu May 30 17:17:32 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 30 May 2024 17:17:32 GMT Subject: RFR: 8332487: Regression in Crypto-AESGCMBench.encrypt (and others) after JDK-8328181 [v2] In-Reply-To: References: Message-ID: > Re-instantiating the ClearArray opcode check in match_rule_supported_vector, this caused performance regressions in some worklets in Renaissance BM since it prevented small sized instance initialization using quadword stores which showed better performance on non-AVX512 targets. > > Our intent was to save code bloating due to long sequences of quadword store with large InitArrayShortSize value to prevent any side effects on in-lining decisions. Performance of an existing [Benchmark](https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/vm/compiler/ClearMemory.java) does not show much performance variation. > > > Baseline with -XX:InitArrayShortSize=100000000 > > Benchmark Mode Cnt Score Error Units > ClearMemory.testClearMemory16K thrpt 2 2695259.360 ops/s > ClearMemory.testClearMemory1K thrpt 2 48622330.474 ops/s > ClearMemory.testClearMemory1M thrpt 2 79546.779 ops/s > ClearMemory.testClearMemory24B thrpt 2 252740278.617 ops/s > ClearMemory.testClearMemory2K thrpt 2 24781443.547 ops/s > ClearMemory.testClearMemory32B thrpt 2 251588987.342 ops/s > ClearMemory.testClearMemory32K thrpt 2 1487427.378 ops/s > ClearMemory.testClearMemory40B thrpt 2 213856093.091 ops/s > ClearMemory.testClearMemory48B thrpt 2 193701317.101 ops/s > ClearMemory.testClearMemory4K thrpt 2 11961450.919 ops/s > ClearMemory.testClearMemory56B thrpt 2 169003238.018 ops/s > ClearMemory.testClearMemory8K thrpt 2 5871416.239 ops/s > ClearMemory.testClearMemory8M thrpt 2 10663.044 ops/s > > > With patch and -XX:InitArrayShortSize=100000000 > > Benchmark Mode Cnt Score Error Units > ClearMemory.testClearMemory16K thrpt 2 3147203.987 ops/s > ClearMemory.testClearMemory1K thrpt 2 48225184.981 ops/s > ClearMemory.testClearMemory1M thrpt 2 80016.400 ops/s > ClearMemory.testClearMemory24B thrpt 2 253904943.981 ops/s > ClearMemory.testClearMemory2K thrpt 2 24664594.490 ops/s > ClearMemory.testClearMemory32B thrpt 2 255507231.954 ops/s > ClearMemory.testClearMemory32K thrpt 2 1636220.531 ops/s > ClearMemory.testClearMemory40B thrpt 2 220718255.832 ops/s > ClearMemory.testClearMemory48B thrpt 2 196294911.715 ops/s > ClearMemory.test... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Corrected misspelled keyword ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19447/files - new: https://git.openjdk.org/jdk/pull/19447/files/5df1290f..79cb57e2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19447&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19447&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19447.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19447/head:pull/19447 PR: https://git.openjdk.org/jdk/pull/19447 From jbhateja at openjdk.org Thu May 30 17:17:32 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 30 May 2024 17:17:32 GMT Subject: Integrated: 8332487: Regression in Crypto-AESGCMBench.encrypt (and others) after JDK-8328181 In-Reply-To: References: Message-ID: On Wed, 29 May 2024 07:49:21 GMT, Jatin Bhateja wrote: > Re-instantiating the ClearArray opcode check in match_rule_supported_vector, this caused performance regressions in some worklets in Renaissance BM since it prevented small sized instance initialization using quadword stores which showed better performance on non-AVX512 targets. > > Our intent was to save code bloating due to long sequences of quadword store with large InitArrayShortSize value to prevent any side effects on in-lining decisions. Performance of an existing [Benchmark](https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/vm/compiler/ClearMemory.java) does not show much performance variation. > > > Baseline with -XX:InitArrayShortSize=100000000 > > Benchmark Mode Cnt Score Error Units > ClearMemory.testClearMemory16K thrpt 2 2695259.360 ops/s > ClearMemory.testClearMemory1K thrpt 2 48622330.474 ops/s > ClearMemory.testClearMemory1M thrpt 2 79546.779 ops/s > ClearMemory.testClearMemory24B thrpt 2 252740278.617 ops/s > ClearMemory.testClearMemory2K thrpt 2 24781443.547 ops/s > ClearMemory.testClearMemory32B thrpt 2 251588987.342 ops/s > ClearMemory.testClearMemory32K thrpt 2 1487427.378 ops/s > ClearMemory.testClearMemory40B thrpt 2 213856093.091 ops/s > ClearMemory.testClearMemory48B thrpt 2 193701317.101 ops/s > ClearMemory.testClearMemory4K thrpt 2 11961450.919 ops/s > ClearMemory.testClearMemory56B thrpt 2 169003238.018 ops/s > ClearMemory.testClearMemory8K thrpt 2 5871416.239 ops/s > ClearMemory.testClearMemory8M thrpt 2 10663.044 ops/s > > > With patch and -XX:InitArrayShortSize=100000000 > > Benchmark Mode Cnt Score Error Units > ClearMemory.testClearMemory16K thrpt 2 3147203.987 ops/s > ClearMemory.testClearMemory1K thrpt 2 48225184.981 ops/s > ClearMemory.testClearMemory1M thrpt 2 80016.400 ops/s > ClearMemory.testClearMemory24B thrpt 2 253904943.981 ops/s > ClearMemory.testClearMemory2K thrpt 2 24664594.490 ops/s > ClearMemory.testClearMemory32B thrpt 2 255507231.954 ops/s > ClearMemory.testClearMemory32K thrpt 2 1636220.531 ops/s > ClearMemory.testClearMemory40B thrpt 2 220718255.832 ops/s > ClearMemory.testClearMemory48B thrpt 2 196294911.715 ops/s > ClearMemory.test... This pull request has now been integrated. Changeset: 1d889e54 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/1d889e54fc6d6039e68191420bb377ea560e2eaa Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8332487: Regression in Crypto-AESGCMBench.encrypt (and others) after JDK-8328181 Reviewed-by: thartmann ------------- PR: https://git.openjdk.org/jdk/pull/19447 From aph at openjdk.org Thu May 30 17:56:12 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 30 May 2024 17:56:12 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v18] In-Reply-To: <-EY3zP64NRZotMOx7aquatQhDG7eMfRJoUx1AF94_Iw=.d2d45b7e-b1b9-480b-8136-740f87c6b610@github.com> References: <-EY3zP64NRZotMOx7aquatQhDG7eMfRJoUx1AF94_Iw=.d2d45b7e-b1b9-480b-8136-740f87c6b610@github.com> Message-ID: On Thu, 30 May 2024 16:38:34 GMT, Andrew Haley wrote: > > 1. use only a primary slot probe, no secondary > > > > 2. use a deterministic secondary probe (based on the hash), not random > > ``` > > Looks pretty deterministic to me. Every value has two hash codes, primary and secondary,and they are different. Ah, I just realized what you meant. Without random replacement in the cache, there is a high probability that scoped values will collide, because the cache is small. Even with only two scoped values accessed alternately, each will repeatedly kick out the other, leading to a linear probe every time. The only way to avoid this without random replacement would be to make the cache considerably larger, and even then it would still occasionally happen. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2140464387 From kvn at openjdk.org Thu May 30 18:10:01 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 30 May 2024 18:10:01 GMT Subject: RFR: 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer [v2] In-Reply-To: References: Message-ID: On Wed, 29 May 2024 10:04:14 GMT, Matthias Baesken wrote: >> When running on macOS with ubsan enabled, we see some issues in relocInfo (hpp and cpp); those already occur in the build quite early. >> >> /jdk/src/hotspot/share/code/relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer >> >> Similar happens when we add to the _current pointer >> _current++; >> this gives : >> relocInfo.hpp:606:13: runtime error: applying non-zero offset to non-null pointer 0xfffffffffffffffe produced null pointer >> >> Seems the pointer subtraction/addition worked so far, so it might be an option to disable ubsan for those 2 functions. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > use template functions `RelocIterator` is used in a lot of places and not all are guarded by `has_locs()`. The code assumes that `RelocIterator::next()` will return `false` if no relocations are present. We have to use pre-increment in `next()` with check after it because in following code there are `current()` accessing `_current`. I don't want to touch this code. I really don't want to add `nullptr` check into this hot code which may affect performance. That is why I agreed with latest changes. Based on this discussion I am fine to keep them locally in `relocInfo.hpp` with more descriptive names. We can also add comment explaining why we need them. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19424#issuecomment-2140513107 From kvn at openjdk.org Thu May 30 18:18:01 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 30 May 2024 18:18:01 GMT Subject: RFR: 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer [v2] In-Reply-To: <4LTCBpKcJDyCZGIKgDYczbkrxHJYY85qyBuA21e8B9E=.e43e018a-bfa2-4554-bf4f-80f6c2afc878@github.com> References: <4LTCBpKcJDyCZGIKgDYczbkrxHJYY85qyBuA21e8B9E=.e43e018a-bfa2-4554-bf4f-80f6c2afc878@github.com> Message-ID: On Thu, 30 May 2024 14:31:28 GMT, Axel Boldt-Christmas wrote: > The miss here seems to be that `has_loc` does not mean "This CodeSection has relocatations". But means "This CodeSection has allocated a relocations buffer". I believe the correct check would be `cs->locs_count() == 0` This suggestion seems correct because we may allocate relocation buffer in section which does not have relocations [codeBuffer.cpp#L169](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/asm/codeBuffer.cpp#L169). But this is different issue for different RFE. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19424#issuecomment-2140538618 From kvn at openjdk.org Thu May 30 18:33:04 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 30 May 2024 18:33:04 GMT Subject: RFR: 8331159: VM build without C2 fails after JDK-8180450 In-Reply-To: <86NGpx5VVOK6KuR1qbhLRS27zau-DEwXW31EakcquYY=.4d56d214-1a07-4ff5-a1af-e18a545ad725@github.com> References: <86NGpx5VVOK6KuR1qbhLRS27zau-DEwXW31EakcquYY=.4d56d214-1a07-4ff5-a1af-e18a545ad725@github.com> Message-ID: On Thu, 25 Apr 2024 20:54:23 GMT, Bernhard Urban-Forster wrote: > x86 bits are fine. Okay since Andrew agreed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18962#pullrequestreview-2089105304 From kxu at openjdk.org Thu May 30 19:35:37 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 30 May 2024 19:35:37 GMT Subject: RFR: 8327381: Refactor type-improving transformations in BoolNode::Ideal to BoolNode::Value [v10] In-Reply-To: References: Message-ID: > This PR resolves [JDK-8327381](https://bugs.openjdk.org/browse/JDK-8327381) > > Currently the transformations for expressions with patterns `((x & m) u<= m)` or `((m & x) u<= m)` to `true` is in `BoolNode::Ideal` function with a new constant node of value `1` created. However, this is technically a type-improving (reduction in range) transformation that's better suited in `BoolNode::Value` function. > > New unit test `test/hotspot/jtreg/compiler/c2/TestBoolNodeGvn.java` asserting on IR nodes and correctness of this transformation is added and passing. Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 18 additional commits since the last revision: - Merge branch 'master' into boolnode-refactor - move test location, add negative test case, simplify imports - Merge branch 'master' into boolnode-refactor - refactor BoolNode::Value() and extract code to ::Value_cmpu_and_mask - update comments - fix indentation again - apply test only on x64, aarch64 and riscv64 - also renames the class name in @run - update test @run annotation - improve formatting, correct annotation and rename test class - ... and 8 more: https://git.openjdk.org/jdk/compare/e6f10cf2...84784c74 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18198/files - new: https://git.openjdk.org/jdk/pull/18198/files/278c436a..84784c74 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18198&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18198&range=08-09 Stats: 91017 lines in 1666 files changed: 63182 ins; 18307 del; 9528 mod Patch: https://git.openjdk.org/jdk/pull/18198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18198/head:pull/18198 PR: https://git.openjdk.org/jdk/pull/18198 From dlong at openjdk.org Thu May 30 20:54:12 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 30 May 2024 20:54:12 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v18] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 14:54:17 GMT, Roland Westrelin wrote: >> This change implements C2 optimizations for calls to >> ScopedValue.get(). Indeed, in: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> `v2` can be replaced by `v1` and the second call to `get()` can be >> optimized out. That's true whatever is between the 2 calls unless a >> new mapping for `scopedValue` is created in between (when that happens >> no optimizations is performed for the method being compiled). Hoisting >> a `get()` call out of loop for a loop invariant `scopedValue` should >> also be legal in most cases. >> >> `ScopedValue.get()` is implemented in java code as a 2 step process. A >> cache is attached to the current thread object. If the `ScopedValue` >> object is in the cache then the result from `get()` is read from >> there. Otherwise a slow call is performed that also inserts the >> mapping in the cache. The cache itself is lazily allocated. One >> `ScopedValue` can be hashed to 2 different indexes in the cache. On a >> cache probe, both indexes are checked. As a consequence, the process >> of probing the cache is a multi step process (check if the cache is >> present, check first index, check second index if first index >> failed). If the cache is populated early on, then when the method that >> calls `ScopedValue.get()` is compiled, profile reports the slow path >> as never taken and only the read from the cache is compiled. >> >> To perform the optimizations, I added 3 new node types to C2: >> >> - the pair >> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for >> the cache probe >> >> - a cfg node ScopedValueGetResultNode to help locate the result of the >> `get()` call in the IR graph. >> >> In pseudo code, once the nodes are inserted, the code of a `get()` is: >> >> >> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) >> if (hits_in_the_cache) { >> res = ScopedValueGetLoadFromCache(hits_in_the_cache); >> } else { >> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex >> } >> res = ScopedValueGetResult(res) >> >> >> In the snippet: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> Replacing `v2` by `v1` is then done by starting from the >> `ScopedValueGetResult` node for the second `get()` and looking for a >> dominating `ScopedValueGetResult` for the same `ScopedValue` >> object. When one is found, it is used as a replacement. Eliminating >> the second `get()` call is achieved by making >> `ScopedValueGetHitsInCache` always successful if there's a dominating >> `Scoped... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > whitespaces I guess I don't understand how random replacement is supposed to help. Do you have a pointer to where I can read up on the topic? I would think that for small numbers of scoped values, assigning hash slots sequentially would work well. Only if there is a collision, use secondary slot. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2140845437 From aboldtch at openjdk.org Thu May 30 21:56:02 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 30 May 2024 21:56:02 GMT Subject: RFR: 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer [v2] In-Reply-To: References: Message-ID: On Thu, 30 May 2024 16:14:55 GMT, Stefan Karlsson wrote: >> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: >> >> use template functions > > Oh, and as it doesn't seem to have been clear from my earlier comments: I don't strongly oppose that you fix it this way you do in the RelocIterator, since I have very little interaction with that code. > > The comment was more that I would prefer if we take a case-by-case approach when we look at other parts of HotSpot with similar problems and really think what the correct solution would be, and that we don't too quickly start to grab for the `add/sub_to_ptr` solution. Putting these functions in globalDefinitions makes it all too easy to just grab for these functions when we try to solve similar problems, IMHO. That's my 2c. I'm not blocking this patch, as long as we get somewhat decent names. My stance is the same as @stefank that I do not oppose this change to fix the immediate issue. Looking closer at how the `RelocIterator` is created from a `nmethod` it would never end up with a `nullptr - 1`. Because `relocation_begin()`, which is used to initialize `_current`, would never produce a nullptr. So there is no issue with the other constructor. So plugging the three holes above would remove the ub. (Along with introducing the invariant that you are not allowed to construct from a `CodeSection` with no relocations). > But this is different issue for different RFE. It may be a different RFE, but it is the same issue (unless I am misunderstanding you are referring to). The `!has_loc()` was specifically introduced to solve this exact ub bug. However it was the wrong property to check. Reading #12854 gives me this impression as well. (Given that the logic around `has_loc` does not seem to have changed since 8153779ad32d1e8ddd37ced826c76c7aafc61894 ) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19424#issuecomment-2140919423 From kvn at openjdk.org Thu May 30 22:21:03 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 30 May 2024 22:21:03 GMT Subject: RFR: 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer [v2] In-Reply-To: References: Message-ID: On Wed, 29 May 2024 10:04:14 GMT, Matthias Baesken wrote: >> When running on macOS with ubsan enabled, we see some issues in relocInfo (hpp and cpp); those already occur in the build quite early. >> >> /jdk/src/hotspot/share/code/relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer >> >> Similar happens when we add to the _current pointer >> _current++; >> this gives : >> relocInfo.hpp:606:13: runtime error: applying non-zero offset to non-null pointer 0xfffffffffffffffe produced null pointer >> >> Seems the pointer subtraction/addition worked so far, so it might be an option to disable ubsan for those 2 functions. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > use template functions So you want to patch the path which introduces `nullptr`. And in addition to your suggested fix we need to adjust assert: RelocIterator::RelocIterator(CodeSection* cs, address begin, address limit) { initialize_misc(); - assert(((cs->locs_start() != nullptr) && (cs->locs_end() != nullptr)) || - ((cs->locs_start() == nullptr) && (cs->locs_end() == nullptr)), "valid start and end pointer"); + assert(((cs->locs_start() != nullptr) && (cs->locs_end() != nullptr)), "valid start and end pointer"); _current = cs->locs_start()-1; This seems reasonable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19424#issuecomment-2140943802 From dlong at openjdk.org Thu May 30 22:47:03 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 30 May 2024 22:47:03 GMT Subject: RFR: 8326615: C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v5] In-Reply-To: References: Message-ID: <8jOAyeuwyQ1V-knX_8AHsdOci0cr5mfcyKseBEt8Kpg=.79b51a11-0aae-4676-aafe-df6113ac6fc7@github.com> On Tue, 28 May 2024 07:16:15 GMT, Damon Fenacci wrote: >> # Issue >> >> The test `compiler/startup/StartupOutput.java` fails intermittently due to a crash after correctly printing the error `Initial size of CodeCache is too small` (the test limits the code cache using k-XX:InitialCodeCacheSize=1024K -XX:ReservedCodeCacheSize=1200k`). >> The appearance of the issue is very dependent on thread scheduling. The original report happens during C1 initialization but C2 initialization is affected as well. >> >> # Causes >> >> There is one occurrence during C1 initialization and one during C2 initialization where a call to `RuntimeStub::new_runtime_stub` can fail fatally if there is not enough space left. >> For C1: `Compiler::init_c1_runtime` -> `Runtime1::initialize` -> `Runtime1::generate_blob_for` -> `Runtime1::generate_blob` -> `RuntimeStub::new_runtime_stub`. >> For C2: `C2Compiler::initialize` -> `OptoRuntime::generate` -> `OptoRuntime::generate_stub` -> `Compile::Compile` -> `Compile::Code_Gen` -> `PhaseOutput::install` -> `PhaseOutput::install_stub` -> `RuntimeStub::new_runtime_stub`. >> >> # Solution >> >> https://github.com/openjdk/jdk/pull/15970 introduced an optional argument to `RuntimeStub::new_runtime_stub` to determine if it fails fatally or not. We can take advantage of it to avoid crashing and instead pass the information about the success or failure of the allocation up the (C1 and C2 initialization) call stack up to where we can set the compilations as failed. > > Damon Fenacci has updated the pull request incrementally with three additional commits since the last revision: > > - Update src/hotspot/share/gc/z/c1/zBarrierSetC1.cpp > > Co-authored-by: Tobias Hartmann > - Update src/hotspot/share/gc/z/c1/zBarrierSetC1.cpp > > Co-authored-by: Tobias Hartmann > - Update src/hotspot/share/gc/x/c1/xBarrierSetC1.cpp > > Co-authored-by: Tobias Hartmann This looks OK, but isn't it a lot of changes just to get this test to pass? Aren't all of these allocation failures ultimately fatal? Is there a simpler way to handle this problem? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19280#issuecomment-2140965888 From gcao at openjdk.org Fri May 31 02:16:26 2024 From: gcao at openjdk.org (Gui Cao) Date: Fri, 31 May 2024 02:16:26 GMT Subject: RFR: 8333276: RISC-V: client VM build failure after JDK-8241503 Message-ID: Hi, please review this patch that fix the client VM build failed for riscv. Error log for client VM build to see: [JDK-8333276](https://bugs.openjdk.org/browse/JDK-8333276) The root cause is that `src/hotspot/share/code/compiledIC.hpp` include `"opto/c2_MacroAssembler.hpp"`, after that `opto/c2_MacroAssembler.hpp` include `c2_MacroAssembler_riscv.hpp`. The fix is that we extracted the `spill_vmask, unspill_vmask` function definitions into `c2_MacroAssembler_riscv.cpp`. `c2_MacroAssembler_riscv.cpp` will only compile if the `COMPILER2` macro is present. ### Testing - [x] linux-riscv client VM fastdebug native build ------------- Commit messages: - 8333276: RISC-V: client VM build failure after JDK-8241503 Changes: https://git.openjdk.org/jdk/pull/19481/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19481&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333276 Stats: 28 lines in 2 files changed: 15 ins; 11 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19481.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19481/head:pull/19481 PR: https://git.openjdk.org/jdk/pull/19481 From rcastanedalo at openjdk.org Fri May 31 04:47:14 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 31 May 2024 04:47:14 GMT Subject: RFR: 8332959: C2: ZGC fails with 'Incorrect load shift' when invoking Object.clone() reflectively on an array Message-ID: This changeset enforces cloned arrays to be initialized at allocation time when their type is unknown, as expected by ZGC in this scenario (see the [JBS issue](https://bugs.openjdk.org/projects/JDK/issues/JDK-8332959) for further details). Array clones with unknown type may arise from compiling the array-guarded branch of a reflective `Object.clone()` invocation, as illustrated by the included test. #### Testing - tier1-5, stress test (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). - tier6-7 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode; ZGC tests only). ------------- Commit messages: - Disable ReduceBulkZeroing for array clones where the source type is unknown - Add regression test Changes: https://git.openjdk.org/jdk/pull/19486/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19486&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332959 Stats: 41 lines in 2 files changed: 37 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19486.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19486/head:pull/19486 PR: https://git.openjdk.org/jdk/pull/19486 From fyang at openjdk.org Fri May 31 05:38:01 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 31 May 2024 05:38:01 GMT Subject: RFR: 8333276: RISC-V: client VM build failure after JDK-8241503 In-Reply-To: References: Message-ID: On Thu, 30 May 2024 14:05:42 GMT, Gui Cao wrote: > Hi, please review this patch that fix the client VM build failed for riscv. > > Error log for client VM build to see: [JDK-8333276](https://bugs.openjdk.org/browse/JDK-8333276) > > The root cause is that `src/hotspot/share/code/compiledIC.hpp` include `"opto/c2_MacroAssembler.hpp"`, after that `opto/c2_MacroAssembler.hpp` include `c2_MacroAssembler_riscv.hpp`. > > The fix is that we extracted the `spill_vmask, unspill_vmask` function definitions into `c2_MacroAssembler_riscv.cpp`. `c2_MacroAssembler_riscv.cpp` will only compile if the `COMPILER2` macro is present. > ### Testing > - [x] linux-riscv client VM fastdebug native build Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19481#pullrequestreview-2089928302 From dlong at openjdk.org Fri May 31 06:30:01 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 31 May 2024 06:30:01 GMT Subject: RFR: 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer [v2] In-Reply-To: References: Message-ID: On Wed, 29 May 2024 10:04:14 GMT, Matthias Baesken wrote: >> When running on macOS with ubsan enabled, we see some issues in relocInfo (hpp and cpp); those already occur in the build quite early. >> >> /jdk/src/hotspot/share/code/relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer >> >> Similar happens when we add to the _current pointer >> _current++; >> this gives : >> relocInfo.hpp:606:13: runtime error: applying non-zero offset to non-null pointer 0xfffffffffffffffe produced null pointer >> >> Seems the pointer subtraction/addition worked so far, so it might be an option to disable ubsan for those 2 functions. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > use template functions I believe using has_locs() is fine in relocate_code_to(). We just need to call it against `dest_cs`, which the iterator is using. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19424#issuecomment-2141314408 From chagedorn at openjdk.org Fri May 31 07:00:05 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 31 May 2024 07:00:05 GMT Subject: Integrated: 8332032: C2: Remove ExpandSubTypeCheckAtParseTime flag In-Reply-To: References: Message-ID: On Tue, 28 May 2024 15:01:14 GMT, Christian Hagedorn wrote: > With this patch I propose to remove the diagnostic product flag `ExpandSubTypeCheckAtParseTime` for the following reasons: > - Expanding sub type checks eagerly during parse time has a maintenance cost. We've had to make special fixes due to skipping `SubTypeCheckNodes` in the past (recent example: [JDK-8328702](https://bugs.openjdk.org/browse/JDK-8328702), where the idea of removing this flag was first discussed). > - This stress option has not helped much to find bugs. Going through JBS, maybe only 1 or 2 bugs can be attributed to this flag over the last 4 years - and even for those, it could have very well be that the flag was not required because it was often accompanied by other stress flags such as `StressReflecitiveCode`. > - We currently have a bug in Valhalla ([JDK-8331912](https://bugs.openjdk.org/browse/JDK-8331912)) which only happens with `ExpandSubTYpeCheckAtParseTime`. The reason is that we lose flatness information due to the eager sub type expansion. Later, data becomes top and the corresponding (already expanded) sub type check fails to fold control as well, leading to a broken graph. The simplest solution is to remove `ExpandSubTYpeCheckAtParseTime`. > > Thanks, > Christian This pull request has now been integrated. Changeset: 95c8a69b Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/95c8a69b0e7a99ec0cd41aa9b6ba033fd3216695 Stats: 18 lines in 4 files changed: 1 ins; 13 del; 4 mod 8332032: C2: Remove ExpandSubTypeCheckAtParseTime flag Reviewed-by: thartmann, rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/19430 From burban at openjdk.org Fri May 31 08:03:03 2024 From: burban at openjdk.org (Bernhard Urban-Forster) Date: Fri, 31 May 2024 08:03:03 GMT Subject: RFR: 8331159: VM build without C2 fails after JDK-8180450 In-Reply-To: <86NGpx5VVOK6KuR1qbhLRS27zau-DEwXW31EakcquYY=.4d56d214-1a07-4ff5-a1af-e18a545ad725@github.com> References: <86NGpx5VVOK6KuR1qbhLRS27zau-DEwXW31EakcquYY=.4d56d214-1a07-4ff5-a1af-e18a545ad725@github.com> Message-ID: On Thu, 25 Apr 2024 20:54:23 GMT, Bernhard Urban-Forster wrote: > x86 bits are fine. Thanks! Can someone sponsor it please? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18962#issuecomment-2141429752 From mbaesken at openjdk.org Fri May 31 08:04:27 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 31 May 2024 08:04:27 GMT Subject: RFR: 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer [v3] In-Reply-To: References: Message-ID: > When running on macOS with ubsan enabled, we see some issues in relocInfo (hpp and cpp); those already occur in the build quite early. > > /jdk/src/hotspot/share/code/relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer > > Similar happens when we add to the _current pointer > _current++; > this gives : > relocInfo.hpp:606:13: runtime error: applying non-zero offset to non-null pointer 0xfffffffffffffffe produced null pointer > > Seems the pointer subtraction/addition worked so far, so it might be an option to disable ubsan for those 2 functions. Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: rename templates ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19424/files - new: https://git.openjdk.org/jdk/pull/19424/files/bbb0c96f..26a5ba13 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19424&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19424&range=01-02 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19424.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19424/head:pull/19424 PR: https://git.openjdk.org/jdk/pull/19424 From mbaesken at openjdk.org Fri May 31 08:15:07 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 31 May 2024 08:15:07 GMT Subject: RFR: 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer [v3] In-Reply-To: References: Message-ID: <8WQBn0behVvE6ldI0QtwvzeXIvchbxoL52f1DvUaY0U=.d2391db7-05ff-4d1e-87f8-7f2be4190042@github.com> On Fri, 31 May 2024 08:04:27 GMT, Matthias Baesken wrote: >> When running on macOS with ubsan enabled, we see some issues in relocInfo (hpp and cpp); those already occur in the build quite early. >> >> /jdk/src/hotspot/share/code/relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer >> >> Similar happens when we add to the _current pointer >> _current++; >> this gives : >> relocInfo.hpp:606:13: runtime error: applying non-zero offset to non-null pointer 0xfffffffffffffffe produced null pointer >> >> Seems the pointer subtraction/addition worked so far, so it might be an option to disable ubsan for those 2 functions. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > rename templates I renamed the templates to sub / add_to_ptr_maybe_null . Maybe other changes could be done in a separate RFE . ------------- PR Comment: https://git.openjdk.org/jdk/pull/19424#issuecomment-2141448407 From jbhateja at openjdk.org Fri May 31 08:24:13 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 31 May 2024 08:24:13 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v22] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 17:18:04 GMT, Vladimir Kozlov wrote: >> Is it enough to have AVX512F present for APX? What about Knight CPUs which have limited AVX512 features? > > You should add code which checks CPUID features bit to set `UseAPX`. Or set it to `false` unconditionally in this PR regardless UseAVX value with comment "APX is not supported on this CPU". Otherwise someone will switch it on command line on avx512 machine. > > Or we should push [#18562](https://github.com/openjdk/jdk/pull/18562) first. Which I prefer. > What about Knight CPUs which have limited AVX512 features? Any VEX encoded instruction directly accessing an EGPR operand or a memory operand with EGPR BASE / INDEX must be promoted to Extended EVEX encoding. Please consider following example:- CPROMPT>xed64 -64 -d 62 da 7d 28 18 13 62DA7D281813 ICLASS: VBROADCASTSS CATEGORY: BROADCAST EXTENSION: AVX512EVEX IFORM: VBROADCASTSS_YMMf32_MASKmskw_MEMf32_AVX512 ISA_SET: AVX512F_256 ATTRIBUTES: DISP8_TUPLE1 MASKOP_EVEX MEMORY_FAULT_SUPPRESSION SHORT: vbroadcastss ymm2, dword ptr [r27] Since broadcast accesses an EGPR register and a non-512 bit vector hence target CPU must also be capable of supporting vector length orthogonality which necessitate AVX512VL feature. APX in true sense is a mix of both ISA extension (new PUSH/POP2, PPX and NDD instructions, JMPABS, COND COMPARE etc..) and also provisions additional general purpose registers to existing ISA, in the later case user may expect to benefit by saving costly 3 cycles spills from GPR to XMM enabled with UseFPUForSpilling. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1621948136 From varadam at openjdk.org Fri May 31 08:57:04 2024 From: varadam at openjdk.org (Varada M) Date: Fri, 31 May 2024 08:57:04 GMT Subject: RFR: 8331935: Add support for primitive array C1 clone intrinsic in PPC [v4] In-Reply-To: References: <7HG6uTSZR9fs7PrsTNR1N0rzUCIwIgX3-W0VGPcrRyY=.0db16311-1313-4476-84c4-285ebd2a3fbc@github.com> Message-ID: On Thu, 30 May 2024 15:09:20 GMT, Martin Doerr wrote: >> Varada M has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: >> >> - Merge branch 'master' into arryClone >> - Add support for primitive array C1 clone intrinsic >> - Add support for primitive array C1 clone intrinsic >> - Add support for primitive array C1 clone intrinsic >> - Add support for primitive array C1 clone intrinsic > > Thanks for the hint! We should wait for that one to be fixed. > Thank you @TheRealMDoerr @offamitkumar . I am running the tests: hotspot_compiler, hotspot_gc, hotspot_serviceability and hotspot_runtime for tier1, tier2 and tier3 with fastdebug, slowdebug and release. I will update the results. Completed the testing for fastdebug. There are few unrelated test failures ------------- PR Comment: https://git.openjdk.org/jdk/pull/19250#issuecomment-2141533829 From duke at openjdk.org Fri May 31 09:07:15 2024 From: duke at openjdk.org (MaxXing) Date: Fri, 31 May 2024 09:07:15 GMT Subject: RFR: 8333334: C2: Make result of `Node::dominates` more precise to enhance scalar replacement Message-ID: This patch changes the algorithm of `Node::dominates` to make the result more precise, and allows the iterators of `ConcurrentHashMap` to be scalar replaced. The previous algorithm will return a conservative result when encountering a dead control flow, and only try the first two input paths of a multi-input Region node, which may prevent the scalar replacement in some cases. For example, with G1 GC enabled, C2 generates GC barriers for `ConcurrentHashMap` iteration operations at some early phases, and then eliminates them in a later IGVN, but `LoadNode` is also idealized in the same IGVN. This causes `LoadNode::Ideal` to see some dead barrier control flows, and refuse to split some instance field loads through Phi due to the conservative result of `Node::dominates`, and thus the scalar replacement can not be applied to iterators in the later macro elimination phase. This patch allows `Node::dominates` to try other paths of the last multi-input Region node when the first path is dead, and makes `ConcurrentHashMap` iteration ~30% faster: Benchmark (nkeys) Mode Cnt Score Error Units Maps.testConcurrentHashMapIterators 10000 avgt 15 414099.085 ? 33230.945 ns/op # baseline Maps.testConcurrentHashMapIterators 10000 avgt 15 315490.281 ? 3037.056 ns/op # patch Testing: tier1-4. ------------- Commit messages: - Make `Node::dominates` more precise so that iterators of `ConcurrentHashMap` can be scalar replaced. Changes: https://git.openjdk.org/jdk/pull/19496/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19496&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333334 Stats: 87 lines in 4 files changed: 53 ins; 15 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/19496.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19496/head:pull/19496 PR: https://git.openjdk.org/jdk/pull/19496 From aph at openjdk.org Fri May 31 09:23:12 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 31 May 2024 09:23:12 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v18] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 14:54:17 GMT, Roland Westrelin wrote: >> This change implements C2 optimizations for calls to >> ScopedValue.get(). Indeed, in: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> `v2` can be replaced by `v1` and the second call to `get()` can be >> optimized out. That's true whatever is between the 2 calls unless a >> new mapping for `scopedValue` is created in between (when that happens >> no optimizations is performed for the method being compiled). Hoisting >> a `get()` call out of loop for a loop invariant `scopedValue` should >> also be legal in most cases. >> >> `ScopedValue.get()` is implemented in java code as a 2 step process. A >> cache is attached to the current thread object. If the `ScopedValue` >> object is in the cache then the result from `get()` is read from >> there. Otherwise a slow call is performed that also inserts the >> mapping in the cache. The cache itself is lazily allocated. One >> `ScopedValue` can be hashed to 2 different indexes in the cache. On a >> cache probe, both indexes are checked. As a consequence, the process >> of probing the cache is a multi step process (check if the cache is >> present, check first index, check second index if first index >> failed). If the cache is populated early on, then when the method that >> calls `ScopedValue.get()` is compiled, profile reports the slow path >> as never taken and only the read from the cache is compiled. >> >> To perform the optimizations, I added 3 new node types to C2: >> >> - the pair >> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for >> the cache probe >> >> - a cfg node ScopedValueGetResultNode to help locate the result of the >> `get()` call in the IR graph. >> >> In pseudo code, once the nodes are inserted, the code of a `get()` is: >> >> >> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) >> if (hits_in_the_cache) { >> res = ScopedValueGetLoadFromCache(hits_in_the_cache); >> } else { >> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex >> } >> res = ScopedValueGetResult(res) >> >> >> In the snippet: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> Replacing `v2` by `v1` is then done by starting from the >> `ScopedValueGetResult` node for the second `get()` and looking for a >> dominating `ScopedValueGetResult` for the same `ScopedValue` >> object. When one is found, it is used as a replacement. Eliminating >> the second `get()` call is achieved by making >> `ScopedValueGetHitsInCache` always successful if there's a dominating >> `Scoped... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > whitespaces On 5/30/24 21:54, Dean Long wrote: > I guess I don't understand how random replacement is supposed to help. > Do you have a pointer to where I can read up on the topic? https://en.wikipedia.org/wiki/Cache_replacement_policies#Random_replacement_(RR) has links. The maths is pretty simple: if you have only one slot for each entry, one time in 16 two scoped locals will hit the same slot, and repeatedly kick each other out. If you have two slots with random replacement, then each get() will retry a couple of times until each entry is in a different slot. Random replacement is good for software implementation because it doesn't require history, unlike (say) LRU. Maintaining access history in order to use LRU would be as expensive as the actual get(). > I would think that for small numbers of scoped values, assigning hash > slots sequentially would work well. Only if there is a collision, use > secondary slot. In practice, today's processors speculate both primary and secondary loads in parallel, so there's no added latency for doing both. The cost is almost nothing. Random replacement is a low-cost way to increase the hit ratio of a cache without incurring the added runtime cost of (say) LRU. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2141579998 From stefank at openjdk.org Fri May 31 09:29:02 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 31 May 2024 09:29:02 GMT Subject: RFR: 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer [v3] In-Reply-To: <8WQBn0behVvE6ldI0QtwvzeXIvchbxoL52f1DvUaY0U=.d2391db7-05ff-4d1e-87f8-7f2be4190042@github.com> References: <8WQBn0behVvE6ldI0QtwvzeXIvchbxoL52f1DvUaY0U=.d2391db7-05ff-4d1e-87f8-7f2be4190042@github.com> Message-ID: On Fri, 31 May 2024 08:12:11 GMT, Matthias Baesken wrote: > I renamed the templates to sub / add_to_ptr_maybe_null . > > Maybe other changes could be done in a separate RFE . The other changes would render this RFE unnecessary because _current would then never contain a nullptr. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19424#issuecomment-2141590834 From mdoerr at openjdk.org Fri May 31 09:51:03 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 31 May 2024 09:51:03 GMT Subject: RFR: 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer [v3] In-Reply-To: References: Message-ID: On Fri, 31 May 2024 08:04:27 GMT, Matthias Baesken wrote: >> When running on macOS with ubsan enabled, we see some issues in relocInfo (hpp and cpp); those already occur in the build quite early. >> >> /jdk/src/hotspot/share/code/relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer >> >> Similar happens when we add to the _current pointer >> _current++; >> this gives : >> relocInfo.hpp:606:13: runtime error: applying non-zero offset to non-null pointer 0xfffffffffffffffe produced null pointer >> >> Seems the pointer subtraction/addition worked so far, so it might be an option to disable ubsan for those 2 functions. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > rename templates I guess Matthias only wanted to fix UB in hotspot ASAP and doesn't have the bandwidth to change the design everywhere. Sounds like you guys already have an alternative solution which already works. Maybe you would like to put it into a PR and we continue the discussion there? Nevertheless, having `sub / add_to_ptr_maybe_null` available in hotspot may be a good thing. There are some places where we really use additions with nullptr (e.g. `index_oop_from_field_offset_long` in unsafe.cpp). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19424#issuecomment-2141637028 From jbhateja at openjdk.org Fri May 31 10:17:24 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 31 May 2024 10:17:24 GMT Subject: RFR: 8329031: CPUID feature detection for Advanced Performance Extensions =?UTF-8?B?KEludGVswq4=?= APX) Message-ID: Summary of changes include with the patch:- 1) CPUID based feature detection check for Intel APX extension (https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html) 2) Validation during VM initialization for extended GPRs state save / restoration by OS across context switches of java application threads executing JIT compiled code with new APX ISA. Kindly review and share your feedback. Best Regards, Jatin ------------- Commit messages: - Update vm_version_x86.cpp - Post merge clenups. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8329031 - Minor modification in UseAPX flag description - Making UseAPX a boolean flag. - 32-bit build fix - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8329031 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8329031 - 8329031: CPUID feature detection for APX during VM initialization. Changes: https://git.openjdk.org/jdk/pull/18562/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18562&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8329031 Stats: 179 lines in 8 files changed: 153 ins; 11 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/18562.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18562/head:pull/18562 PR: https://git.openjdk.org/jdk/pull/18562 From duke at openjdk.org Fri May 31 10:17:24 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Fri, 31 May 2024 10:17:24 GMT Subject: RFR: 8329031: CPUID feature detection for Advanced Performance Extensions =?UTF-8?B?KEludGVswq4=?= APX) In-Reply-To: References: Message-ID: <6SdGX2sJgqS0nv6DzDELFML8Jv0GE9BHJwxH54UdQTs=.55228610-14a9-4102-ad46-6bd49c0e1f81@github.com> On Mon, 1 Apr 2024 12:01:27 GMT, Jatin Bhateja wrote: > Summary of changes include with the patch:- > > 1) CPUID based feature detection check for Intel APX extension (https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html) > 2) Validation during VM initialization for extended GPRs state save / restoration by OS across context switches of java application threads executing JIT compiled code with new APX ISA. > > Kindly review and share your feedback. > > Best Regards, > Jatin Hi @jatin-bhateja, Can you merge with the latest since PR #18476 is in now? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18562#issuecomment-2128187189 From aph at openjdk.org Fri May 31 11:30:10 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 31 May 2024 11:30:10 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v18] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 14:54:17 GMT, Roland Westrelin wrote: >> This change implements C2 optimizations for calls to >> ScopedValue.get(). Indeed, in: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> `v2` can be replaced by `v1` and the second call to `get()` can be >> optimized out. That's true whatever is between the 2 calls unless a >> new mapping for `scopedValue` is created in between (when that happens >> no optimizations is performed for the method being compiled). Hoisting >> a `get()` call out of loop for a loop invariant `scopedValue` should >> also be legal in most cases. >> >> `ScopedValue.get()` is implemented in java code as a 2 step process. A >> cache is attached to the current thread object. If the `ScopedValue` >> object is in the cache then the result from `get()` is read from >> there. Otherwise a slow call is performed that also inserts the >> mapping in the cache. The cache itself is lazily allocated. One >> `ScopedValue` can be hashed to 2 different indexes in the cache. On a >> cache probe, both indexes are checked. As a consequence, the process >> of probing the cache is a multi step process (check if the cache is >> present, check first index, check second index if first index >> failed). If the cache is populated early on, then when the method that >> calls `ScopedValue.get()` is compiled, profile reports the slow path >> as never taken and only the read from the cache is compiled. >> >> To perform the optimizations, I added 3 new node types to C2: >> >> - the pair >> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for >> the cache probe >> >> - a cfg node ScopedValueGetResultNode to help locate the result of the >> `get()` call in the IR graph. >> >> In pseudo code, once the nodes are inserted, the code of a `get()` is: >> >> >> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) >> if (hits_in_the_cache) { >> res = ScopedValueGetLoadFromCache(hits_in_the_cache); >> } else { >> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex >> } >> res = ScopedValueGetResult(res) >> >> >> In the snippet: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> Replacing `v2` by `v1` is then done by starting from the >> `ScopedValueGetResult` node for the second `get()` and looking for a >> dominating `ScopedValueGetResult` for the same `ScopedValue` >> object. When one is found, it is used as a replacement. Eliminating >> the second `get()` call is achieved by making >> `ScopedValueGetHitsInCache` always successful if there's a dominating >> `Scoped... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > whitespaces One other thing. Let's say you always check the slot before overwriting it, and only then go to the secondary slot. You find the secondary slot is occupied. The best thing to do then is random replacement. Given that the end effect of just doing random replacement is the same, there's nothing to be gained from the added complexity. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2141833926 From mbaesken at openjdk.org Fri May 31 11:40:04 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 31 May 2024 11:40:04 GMT Subject: RFR: 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer [v3] In-Reply-To: References: Message-ID: On Fri, 31 May 2024 09:48:29 GMT, Martin Doerr wrote: > I guess Matthias only wanted to fix UB in hotspot ASAP and doesn't have the bandwidth to change the design everywhere. Yes . The first goal to make the '--enable-ubsan' configure flag useful; currently we have the configure flag but still fail already in the OpenJDK build (because of a number of ubsan related issues in HS). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19424#issuecomment-2141852058 From chagedorn at openjdk.org Fri May 31 13:01:13 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 31 May 2024 13:01:13 GMT Subject: RFR: 8333252: C2: assert(assertion_predicate_has_loop_opaque_node(iff)) failed: must find OpaqueLoop* nodes In-Reply-To: References: Message-ID: <8CJy3mg4ZtD820PA5ZHc2rP6Lmj2UdgKk9bsH6Ry4n8=.b87c3e1c-ba94-4f89-aaaf-4cf081329f11@github.com> On Fri, 31 May 2024 12:33:04 GMT, Christian Hagedorn wrote: > [JDK-8330386](https://bugs.openjdk.org/browse/JDK-8330386) added some additional asserts to ensure that we are dealing with Template Assertion Predicates and not non-null-checks which both use `Opaque4` nodes. > > #### Correct Assertion > One of this assert was now hit with a fuzzer found case in `get_assertion_predicates()` called during the elimination of useless predicates. We walk through all loops and collect all useful Template Assertion Predicates and Parse Predicates above the loops. For that we look at the UCTs which are shared among the predicates. When finding a predicate with such an UCT which also has an `Opaque4` node, we know that it is a Template Assertion Predicate. We additionally assert that we must find the `OpaqueLoop*Nodes` above which always belong to a template: > > https://github.com/openjdk/jdk/blob/7ab74c5f268dac82bbd36355acf8e4f3d357134c/src/hotspot/share/opto/loopPredicate.cpp#L346-L354 > > So, this assert looks correct. > > #### Why didn't we find `OpaqueLoop*Nodes` in this case? > For the Template Assertion Predicate for the last value, we insert an additional `CastII` to keep the type information of the iv phi: > > https://github.com/openjdk/jdk/blob/7ab74c5f268dac82bbd36355acf8e4f3d357134c/src/hotspot/share/opto/loopPredicate.cpp#L1323-L1324 > > But in the test case, the type of the iv phi is a constant (`521 CastII`): > > ![image](https://github.com/openjdk/jdk/assets/17833009/5dc17b9c-abfe-4846-89a1-4e189234b991) > > `521 CastII` will simply be replaced with a constant during IGVN and the `OpaqueLoop*Nodes` above are removed. We therefore cannot find them anymore later when trying to eliminate useless predicates and we hit the assert. > > #### Why does the `CastII`/iv phi have a constant type? > Having a constant type for the iv phi indicates that the counted loop is only going to be executed for one iteration. But C2 has not had the chance, yet, to fold the loop exit test to remove the loop. > > #### How to fix this bug? > Having a single iteration loop raises the question, why we even bother to try and hoist checks out of such a loop with Loop Predication in the first place. I therefore suggest to simply bail out of Loop Predication if the trip count is 1. This will also prevent us from creating a Template Assertion Predicate with a `CastII` with a constant type from the iv phi which would be folded. > > To do that, we can compute the trip count on entry of Loop Predication. By doing that, we can also remove the trip count computation added for hoisting ran... src/hotspot/share/opto/loopPredicate.cpp line 1366: > 1364: } > 1365: loop->compute_trip_count(this); > 1366: if (cl->trip_count() == 1) { I don't check for `has_exact_trip_count()` here since we could also have non-`ConNodes` with types such that the iv phi is still a constant and we know that we will only iterate for at most one iteration. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19500#discussion_r1622359484 From chagedorn at openjdk.org Fri May 31 13:01:13 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 31 May 2024 13:01:13 GMT Subject: RFR: 8333252: C2: assert(assertion_predicate_has_loop_opaque_node(iff)) failed: must find OpaqueLoop* nodes Message-ID: [JDK-8330386](https://bugs.openjdk.org/browse/JDK-8330386) added some additional asserts to ensure that we are dealing with Template Assertion Predicates and not non-null-checks which both use `Opaque4` nodes. #### Correct Assertion One of this assert was now hit with a fuzzer found case in `get_assertion_predicates()` called during the elimination of useless predicates. We walk through all loops and collect all useful Template Assertion Predicates and Parse Predicates above the loops. For that we look at the UCTs which are shared among the predicates. When finding a predicate with such an UCT which also has an `Opaque4` node, we know that it is a Template Assertion Predicate. We additionally assert that we must find the `OpaqueLoop*Nodes` above which always belong to a template: https://github.com/openjdk/jdk/blob/7ab74c5f268dac82bbd36355acf8e4f3d357134c/src/hotspot/share/opto/loopPredicate.cpp#L346-L354 So, this assert looks correct. #### Why didn't we find `OpaqueLoop*Nodes` in this case? For the Template Assertion Predicate for the last value, we insert an additional `CastII` to keep the type information of the iv phi: https://github.com/openjdk/jdk/blob/7ab74c5f268dac82bbd36355acf8e4f3d357134c/src/hotspot/share/opto/loopPredicate.cpp#L1323-L1324 But in the test case, the type of the iv phi is a constant (`521 CastII`): ![image](https://github.com/openjdk/jdk/assets/17833009/5dc17b9c-abfe-4846-89a1-4e189234b991) `521 CastII` will simply be replaced with a constant during IGVN and the `OpaqueLoop*Nodes` above are removed. We therefore cannot find them anymore later when trying to eliminate useless predicates and we hit the assert. #### Why does the `CastII`/iv phi have a constant type? Having a constant type for the iv phi indicates that the counted loop is only going to be executed for one iteration. But C2 has not had the chance, yet, to fold the loop exit test to remove the loop. #### How to fix this bug? Having a single iteration loop raises the question, why we even bother to try and hoist checks out of such a loop with Loop Predication in the first place. I therefore suggest to simply bail out of Loop Predication if the trip count is 1. This will also prevent us from creating a Template Assertion Predicate with a `CastII` with a constant type from the iv phi which would be folded. To do that, we can compute the trip count on entry of Loop Predication. By doing that, we can also remove the trip count computation added for hoisting range checks with [JDK-8267928](https://bugs.openjdk.org/browse/JDK-8267928): https://github.com/openjdk/jdk/blob/7ab74c5f268dac82bbd36355acf8e4f3d357134c/src/hotspot/share/opto/loopPredicate.cpp#L1212-L1215 This is no longer necessary now. I've added an assert instead to ensure that we've indeed have the correct trip counter computed at Loop Predication entry (might be overly cautious but it's easy to do and a small overhead). I've also added an assert when creating Template Assertion Predicates that we are not having a constant typed iv phi. Thanks, Christian ------------- Commit messages: - 8333252: C2: assert(assertion_predicate_has_loop_opaque_node(iff)) failed: must find OpaqueLoop* nodes Changes: https://git.openjdk.org/jdk/pull/19500/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19500&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333252 Stats: 77 lines in 2 files changed: 74 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19500.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19500/head:pull/19500 PR: https://git.openjdk.org/jdk/pull/19500 From fgao at openjdk.org Fri May 31 15:17:07 2024 From: fgao at openjdk.org (Fei Gao) Date: Fri, 31 May 2024 15:17:07 GMT Subject: RFR: 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" [v3] In-Reply-To: References: <16J-lJ2AceGTVcRWBcP15yKcwO-1IA1XsngyOuNjf7k=.0776f081-ae2c-4279-87cf-d909806c2bc4@github.com> Message-ID: On Wed, 29 May 2024 09:51:14 GMT, Andrew Haley wrote: > This is much better. However, I don't think that all the IndOffXX types do us any good. It would be simpler and faster to match a general-purpose IndOff type then let `legitimize_address()` fix any out-of-range operands. That'd reduce the size of the match rules and the time it takes to run them. Thanks for your review @theRealAph . Matching a general-purpose IndOff type then letting `legitimize_address()` fix any out-of-range operands is an interesting idea and does simplify our code. Assuming that we have a case like: data_a = UNSAFE.getLongUnaligned(TestLong.BYTES, 1030); UNSAFE.putLongUnaligned(BYTES, 1030, data_b); After matcher, we have: ldr R10, [R12, #1030] str R11, [R12, #1030] But `1030` can't be encoded as `base` + `offset` mode, so we need to go to `legitimize_address()`, then we may get: add x8, x12, #0x406 // legitimize_address ldr x10, [x8] add x8, x12, #0x406 // legitimize_address str x11, [x8] We have to re-generate address every time we try to visit the same address. But `IndOff` type in matcher may help us reduce it. What do you think? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16991#issuecomment-2142479156 From fgao at openjdk.org Fri May 31 15:17:13 2024 From: fgao at openjdk.org (Fei Gao) Date: Fri, 31 May 2024 15:17:13 GMT Subject: RFR: 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" [v3] In-Reply-To: References: <16J-lJ2AceGTVcRWBcP15yKcwO-1IA1XsngyOuNjf7k=.0776f081-ae2c-4279-87cf-d909806c2bc4@github.com> Message-ID: On Wed, 29 May 2024 18:41:40 GMT, Dean Long wrote: >> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Add the assertion back and merge matchrules with a better predicate >> - Merge branch 'master' into fg8319690 >> - Remove unused immIOffset/immLOffset >> - Merge branch 'master' into fg8319690 >> - 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" >> >> On LP64 systems, if the heap can be moved into low virtual >> address space (below 4GB) and the heap size is smaller than the >> interesting threshold of 4 GB, we can use unscaled decoding >> pattern for narrow klass decoding. It means that a generic field >> reference can be decoded by: >> ``` >> cast<64> (32-bit compressed reference) + field_offset >> ``` >> >> When the `field_offset` is an immediate, on aarch64 platform, the >> unscaled decoding pattern can match perfectly with a direct >> addressing mode, i.e., `base_plus_offset`, supported by LDR/STR >> instructions. But for certain data width, not all immediates can >> be encoded in the instruction field of LDR/STR[1]. The ranges are >> different as data widths vary. >> >> For example, when we try to load a value of long type at offset of >> `1030`, the address expression is `(AddP (DecodeN base) 1030)`. >> Before the patch, the expression was matching with >> `operand indOffIN()`. But, for 64-bit LDR/STR, signed immediate >> byte offset must be in the range -256 to 255 or positive immediate >> byte offset must be a multiple of 8 in the range 0 to 32760[2]. >> `1030` can't be encoded in the instruction field. So, after >> matching, when we do checking for instruction encoding, the >> assertion would fail. >> >> In this patch, we're going to filter out invalid immediates >> when deciding if current addressing mode can be matched as >> `base_plus_offset`. We introduce `indOffIN4/indOffLN4` and >> `indOffIN8/indOffLN8` for 32-bit data type and 64-bit data >> type separately in the patch. E.g., for `memory4`, we remove >> the generic `indOffIN/indOffLN`, which matches wrong unscaled >> immediate range, and replace them with `indOffIN4/indOffLN4` >> instead. >> >> Since 8-bit and 16-bit LDR/STR instructions also support the >> unscaled decoding pattern, we add the addressing mode in the >>... > > src/hotspot/cpu/aarch64/aarch64.ad line 5193: > >> 5191: constraint(ALLOC_IN_RC(ptr_reg)); >> 5192: match(AddP reg off); >> 5193: match(AddP (DecodeN regn) off); > > I'm surprised this works. If we match on "DecodeN regn", is it really safe to use $reg instead? Thanks for your review, @dean-long . Yes, based on the current implementation of our ADL compiler, even if we match on "DecodeN regn", using `$reg` is safe and perhaps even must. When ADLC is parsing operand interface from `indOffIX`, it always fetches useful information from the **first** match rule `match(AddP reg off)` and does not care about others, even though we have multiple match rules. See https://github.com/openjdk/jdk/blob/1e04ee6d57d5fe84e1d202b16e8d13dc13c002ff/src/hotspot/share/adlc/formssel.cpp#L2461 and https://github.com/openjdk/jdk/blob/1e04ee6d57d5fe84e1d202b16e8d13dc13c002ff/src/hotspot/share/adlc/output_c.cpp#L3025. It searches `reg` in `match(AddP reg off);` and finds that `reg` is the `first` one in all components, which is like `regn` is the `first` in `match(AddP (DecodeN regn) off);`. Then it concludes that the **first** operand starting from `oper_input_base()` is the base address input. In the stage of `emit()`, the node structure has been reduced into like: Load === ctrl mem reg val Load === ctrl mem regn val `off` is saved on Operand field. The final JVM code will be shown as: void loadLNode::emit(C2_MacroAssembler* masm, PhaseRegAlloc* ra_) const { // Start at oper_input_base() and count operands unsigned idx0 = 2; unsigned idx1 = 2; // mem { #line 2914 "/home/feigao02/chelsea/jdk_src/src/hotspot/cpu/aarch64/aarch64.ad" Register dst_reg = as_Register(opnd_array(0)->reg(ra_,this)/* dst */); loadStore(masm, &MacroAssembler::ldr, dst_reg, opnd_array(1)->opcode(), as_Register(opnd_array(1)->base(ra_,this,idx1)), opnd_array(1)->index(ra_,this,idx1), opnd_array(1)->scale(), opnd_array(1)->disp(ra_,this,idx1), 8); #line 999999 } } virtual int base(PhaseRegAlloc *ra_, const Node *node, int idx) const { // Replacement variable: reg return (int)ra_->get_encode(node->in(idx)); } To be honest, `$reg` here is a little confusing but, IMO, it may represent a relative index. WDYT? Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16991#discussion_r1622527758 From aph at openjdk.org Fri May 31 15:28:03 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 31 May 2024 15:28:03 GMT Subject: RFR: 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" [v3] In-Reply-To: References: <16J-lJ2AceGTVcRWBcP15yKcwO-1IA1XsngyOuNjf7k=.0776f081-ae2c-4279-87cf-d909806c2bc4@github.com> Message-ID: On Fri, 31 May 2024 15:13:44 GMT, Fei Gao wrote: > But `1030` can't be encoded as `base` + `offset` mode Why not? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16991#issuecomment-2142498279 From stefank at openjdk.org Fri May 31 15:43:02 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 31 May 2024 15:43:02 GMT Subject: RFR: 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer [v3] In-Reply-To: References: Message-ID: On Fri, 31 May 2024 11:37:55 GMT, Matthias Baesken wrote: > I guess Matthias only wanted to fix UB in hotspot ASAP and doesn't have the bandwidth to change the design everywhere. The proposal solves the UB but (IMHO) adds a wart to the code instead of taking a small step back and fixing the root cause of the UB. This then leaves the wart for other maintainers fix with their own bandwidth. > Sounds like you guys already have an alternative solution which already works. Maybe you would like to put it into a PR and we continue the discussion there? I would prefer if we did the right fix here in this PR. > Nevertheless, having sub / add_to_ptr_maybe_null available in hotspot may be a good thing. There are some places where we really use additions with nullptr (e.g. index_oop_from_field_offset_long in unsafe.cpp). My previous arguments have been that I don't think it is a good thing, so our opinion here differs. How many places are there that really *should* be doing add/sub with nullptr? Are most places just like this PR, where using `add_to_ptr_maybe_null` is a convenient way to get away from the UB, but the "real" fix would be to remove the null pointer? Given that we have different opinions about this, could we at least wait with adding `add_to_ptr_maybe_null` until we have assessed the other instances of this UB issue? Do you have a list of other places that have this issue? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19424#issuecomment-2142523001 From aph at openjdk.org Fri May 31 15:55:09 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 31 May 2024 15:55:09 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v6] In-Reply-To: References: Message-ID: On Wed, 17 Jan 2024 07:24:09 GMT, Kim Barrett wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Review feedback. > > src/hotspot/share/asm/register.hpp line 273: > >> 271: } >> 272: >> 273: template > > Rx is unused and not needed. Similarly for 3-R overload. Isn't it? It seems to me to be used for the next line. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16617#discussion_r1622620230 From aph at openjdk.org Fri May 31 16:02:40 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 31 May 2024 16:02:40 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v12] In-Reply-To: References: Message-ID: > At the present time, `assert_different_registers()` uses an O(N**2) algorithm in assert_different_registers(). We can utilize RegSet to do it in O(N) time. This would be a useful optimization for all builds with assertions enabled. > > In addition, it would be useful to be able to static_assert different registers. > > Also, I've taken the opportunity to expand the maximum size of a RegSet to 64 on 64-bit platforms. > > I also fixed a bug: sometimes `noreg` is passed to `assert_different_registers()`, but it may only be passed once or a spurious assertion is triggered. Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: - Merge branch 'clean' into different-regs - Review feedback - Review feedback - Update src/hotspot/share/asm/register.hpp Co-authored-by: Stefan Karlsson - Review feedback - Review feedback - Review feedback - Merge branch 'different-regs' of https://github.com/theRealAph/jdk into different-regs - Update src/hotspot/share/asm/register.hpp Co-authored-by: Emanuel Peter - Merge branch 'clean' into different-regs - ... and 9 more: https://git.openjdk.org/jdk/compare/546bb317...c9fc63d7 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16617/files - new: https://git.openjdk.org/jdk/pull/16617/files/951277be..c9fc63d7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16617&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16617&range=10-11 Stats: 17173 lines in 377 files changed: 10174 ins; 5171 del; 1828 mod Patch: https://git.openjdk.org/jdk/pull/16617.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16617/head:pull/16617 PR: https://git.openjdk.org/jdk/pull/16617 From mdoerr at openjdk.org Fri May 31 16:05:02 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 31 May 2024 16:05:02 GMT Subject: RFR: 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer [v3] In-Reply-To: References: Message-ID: On Fri, 31 May 2024 15:40:26 GMT, Stefan Karlsson wrote: > I would prefer if we did the right fix here in this PR. Well, you could ask Matthias. It's his PR. > Do you have a list of other places that have this issue? No. They typically come up after fixing other ones. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19424#issuecomment-2142558622 From aph at openjdk.org Fri May 31 17:02:07 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 31 May 2024 17:02:07 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v6] In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 18:07:38 GMT, Emanuel Peter wrote: > Just a code-style review. > > Question: could there be some sort of regression test for this, with different examples and edge cases? I have no idea, really. assert_different_registers is used all over the place, and I'm going for bootcycle and tier1. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16617#issuecomment-2142649991 From chagedorn at openjdk.org Fri May 31 17:39:27 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 31 May 2024 17:39:27 GMT Subject: RFR: 8333366: C2: CmpU3Nodes are not pushed back to worklist in PhaseCCP leading to non-fixpoint assertion failure Message-ID: The current code to push uses back to the worklist during CCP handles `CmpU` nodes but misses `CmpU3` nodes. This leads to an assertion failure that we have not reached a fixpoint. The fix is straight forward to add a case for `CmpU3` at the case where we already handle `CmpU` nodes such that they can be added back to the worklist like `CmpU` nodes during CCP. This was found during the analysis of [JDK-8332920](https://bugs.openjdk.org/browse/JDK-8332920) by trying to simplify the regression test (thanks to @TobiHartmann!). To properly add regression tests for JDK-8332920 and avoid hitting this bug here with some flag combination, we should fix this first. I will soon propose a PR for JDK-8332920 as well. Thanks, Christian ------------- Commit messages: - 8333366: C2: CmpU3Nodes are not pushed back to worklist in PhaseCCP leading to non-fixpoint assertion failure Changes: https://git.openjdk.org/jdk/pull/19504/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19504&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333366 Stats: 53 lines in 2 files changed: 51 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19504.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19504/head:pull/19504 PR: https://git.openjdk.org/jdk/pull/19504 From kvn at openjdk.org Fri May 31 18:06:02 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 31 May 2024 18:06:02 GMT Subject: RFR: 8333366: C2: CmpU3Nodes are not pushed back to worklist in PhaseCCP leading to non-fixpoint assertion failure In-Reply-To: References: Message-ID: On Fri, 31 May 2024 17:34:41 GMT, Christian Hagedorn wrote: > The current code to push uses back to the worklist during CCP handles `CmpU` nodes but misses `CmpU3` nodes. This leads to an assertion failure that we have not reached a fixpoint. > > The fix is straight forward to add a case for `CmpU3` at the case where we already handle `CmpU` nodes such that they can be added back to the worklist like `CmpU` nodes during CCP. > > This was found during the analysis of [JDK-8332920](https://bugs.openjdk.org/browse/JDK-8332920) by trying to simplify the regression test (thanks to @TobiHartmann!). To properly add regression tests for JDK-8332920 and avoid hitting this bug here with some flag combination, we should fix this first. I will soon propose a PR for JDK-8332920 as well. > > Thanks, > Christian Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19504#pullrequestreview-2091432641 From kvn at openjdk.org Fri May 31 18:11:02 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 31 May 2024 18:11:02 GMT Subject: RFR: 8332959: C2: ZGC fails with 'Incorrect load shift' when invoking Object.clone() reflectively on an array In-Reply-To: References: Message-ID: <7HNFZ4GHxxG-EQnnbIwyiok0j-Hw8HfS02h-FKZE8jE=.77b401d9-d7db-4519-bd06-932393261550@github.com> On Thu, 30 May 2024 16:50:22 GMT, Roberto Casta?eda Lozano wrote: > This changeset enforces cloned arrays to be initialized at allocation time when their type is unknown, as expected by ZGC in this scenario (see the [JBS issue](https://bugs.openjdk.org/projects/JDK/issues/JDK-8332959) for further details). Array clones with unknown type may arise from compiling the array-guarded branch of a reflective `Object.clone()` invocation, as illustrated by the included test. > > #### Testing > - tier1-5, stress test (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). > - tier6-7 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode; ZGC tests only). Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19486#pullrequestreview-2091440537 From kvn at openjdk.org Fri May 31 18:16:02 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 31 May 2024 18:16:02 GMT Subject: RFR: 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer [v3] In-Reply-To: References: Message-ID: On Fri, 31 May 2024 11:37:55 GMT, Matthias Baesken wrote: >> I guess Matthias only wanted to fix UB in hotspot ASAP and doesn't have the bandwidth to change the design everywhere. Sounds like you guys already have an alternative solution which already works. Maybe you would like to put it into a PR and we continue the discussion there? >> Nevertheless, having `sub / add_to_ptr_maybe_null` available in hotspot may be a good thing. There are some places where we really use additions with nullptr (e.g. `index_oop_from_field_offset_long` in unsafe.cpp). > >> I guess Matthias only wanted to fix UB in hotspot ASAP and doesn't have the bandwidth to change the design everywhere. > > Yes . > The first goal to make the '--enable-ubsan' configure flag useful; currently we have the configure flag but still fail already in the OpenJDK build (because of a number of ubsan related issues in HS). @MBaesken can close this PR and re-assign this bug to me if he don't have time to do proposed changes to code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19424#issuecomment-2142756992 From burban at openjdk.org Fri May 31 18:21:05 2024 From: burban at openjdk.org (Bernhard Urban-Forster) Date: Fri, 31 May 2024 18:21:05 GMT Subject: Integrated: 8331159: VM build without C2 fails after JDK-8180450 In-Reply-To: <86NGpx5VVOK6KuR1qbhLRS27zau-DEwXW31EakcquYY=.4d56d214-1a07-4ff5-a1af-e18a545ad725@github.com> References: <86NGpx5VVOK6KuR1qbhLRS27zau-DEwXW31EakcquYY=.4d56d214-1a07-4ff5-a1af-e18a545ad725@github.com> Message-ID: On Thu, 25 Apr 2024 20:54:23 GMT, Bernhard Urban-Forster wrote: > x86 bits are fine. This pull request has now been integrated. Changeset: 8aeada10 Author: Bernhard Urban-Forster Committer: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/8aeada105acd143b38b02123377ef86513eee266 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8331159: VM build without C2 fails after JDK-8180450 Reviewed-by: thartmann, kvn, aph ------------- PR: https://git.openjdk.org/jdk/pull/18962 From dlong at openjdk.org Fri May 31 20:28:02 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 31 May 2024 20:28:02 GMT Subject: RFR: 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer [v2] In-Reply-To: References: Message-ID: On Thu, 30 May 2024 21:53:15 GMT, Axel Boldt-Christmas wrote: >> Oh, and as it doesn't seem to have been clear from my earlier comments: I don't strongly oppose that you fix it this way you do in the RelocIterator, since I have very little interaction with that code. >> >> The comment was more that I would prefer if we take a case-by-case approach when we look at other parts of HotSpot with similar problems and really think what the correct solution would be, and that we don't too quickly start to grab for the `add/sub_to_ptr` solution. Putting these functions in globalDefinitions makes it all too easy to just grab for these functions when we try to solve similar problems, IMHO. That's my 2c. I'm not blocking this patch, as long as we get somewhat decent names. > > My stance is the same as @stefank that I do not oppose this change to fix the immediate issue. > > Looking closer at how the `RelocIterator` is created from a `nmethod` it would never end up with a `nullptr - 1`. Because `relocation_begin()`, which is used to initialize `_current`, would never produce a nullptr. So there is no issue with the other constructor. So plugging the three holes above would remove the ub. (Along with introducing the invariant that you are not allowed to construct from a `CodeSection` with no relocations). > >> But this is different issue for different RFE. > > It may be a different RFE, but it is the same issue (unless I am misunderstanding you are referring to). The `!has_loc()` was specifically introduced to solve this exact ub bug. However it was the wrong property to check. Reading #12854 gives me this impression as well. (Given that the logic around `has_loc` does not seem to have changed since 8153779ad32d1e8ddd37ced826c76c7aafc61894 ) I agree with @xmas92, and I propose a slight change to his fix: @@ -792,9 +792,8 @@ void CodeBuffer::relocate_code_to(CodeBuffer* dest) const { // section, so that section has to be copied before relocating. for (int n = (int) SECT_FIRST; n < (int)SECT_LIMIT; n++) { // pull code out of each section - const CodeSection* cs = code_section(n); - if (cs->is_empty() || !cs->has_locs()) continue; // skip trivial section CodeSection* dest_cs = dest->code_section(n); + if (dest_cs->is_empty() || dest_cs->locs_count() == 0) continue; // skip trivial section { // Repair the pc relative information in the code after the move RelocIterator iter(dest_cs); while (iter.next()) { ------------- PR Comment: https://git.openjdk.org/jdk/pull/19424#issuecomment-2142932564 From sviswanathan at openjdk.org Fri May 31 21:08:02 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 31 May 2024 21:08:02 GMT Subject: RFR: 8332119: Incorrect IllegalArgumentException for C2 compiled permute kernel In-Reply-To: <6IxHpLmCr2e1fKOcbdG38uhJEOsmVUpgVbcGoH4uMnQ=.ac6c99bd-a222-4dbc-a2b2-fdaf1f94a155@github.com> References: <6IxHpLmCr2e1fKOcbdG38uhJEOsmVUpgVbcGoH4uMnQ=.ac6c99bd-a222-4dbc-a2b2-fdaf1f94a155@github.com> Message-ID: On Wed, 29 May 2024 06:10:53 GMT, Jatin Bhateja wrote: > Currently inline expansion of vector to shuffle conversion simply type casts the vector holding indexes to byte vector[1] where as fallback implementation[2] also wraps the indexes to a valid index range [0, VEC_LEN-1) or generates a -ve index for exceptional / OOB indices. > > This patch extends the conversion inline expander to match the fall back implementation. This imposes around 20% performance tax on Vector.toShuffle() intrinsic but fixes this functional bug. > > Kindly review and share your feedback. > > Best Regards, > Jatin > > PS: Patch also fixes an incorrectness issue reported with [JDK-8332118](https://bugs.openjdk.org/browse/JDK-8332118) > > [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2352 > [2] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractShuffle.java#L58 src/hotspot/share/opto/vectorIntrinsics.cpp line 2411: > 2409: op = wrap_indexes(op, num_elem_to, elem_bt_to); > 2410: } > 2411: The wrap_indexes is needed only for two vector rearrange. It looks to me that doing a wrap_indexes here at convert would force it for single vector rearrange (or selectFrom) and thereby reduce the performance for that case as well. Please note that the single vector rearrange throws "IndexOutOfBoundsException" and doesn't need to do a wrap. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19442#discussion_r1622947142 From kbarrett at openjdk.org Fri May 31 22:00:15 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 31 May 2024 22:00:15 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v12] In-Reply-To: References: Message-ID: On Fri, 31 May 2024 16:02:40 GMT, Andrew Haley wrote: >> At the present time, `assert_different_registers()` uses an O(N**2) algorithm in assert_different_registers(). We can utilize RegSet to do it in O(N) time. This would be a useful optimization for all builds with assertions enabled. >> >> In addition, it would be useful to be able to static_assert different registers. >> >> Also, I've taken the opportunity to expand the maximum size of a RegSet to 64 on 64-bit platforms. >> >> I also fixed a bug: sometimes `noreg` is passed to `assert_different_registers()`, but it may only be passed once or a spurious assertion is triggered. > > Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: > > - Merge branch 'clean' into different-regs > - Review feedback > - Review feedback > - Update src/hotspot/share/asm/register.hpp > > Co-authored-by: Stefan Karlsson > - Review feedback > - Review feedback > - Review feedback > - Merge branch 'different-regs' of https://github.com/theRealAph/jdk into different-regs > - Update src/hotspot/share/asm/register.hpp > > Co-authored-by: Emanuel Peter > - Merge branch 'clean' into different-regs > - ... and 9 more: https://git.openjdk.org/jdk/compare/aaa8ceb4...c9fc63d7 Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16617#pullrequestreview-2091736260 From kbarrett at openjdk.org Fri May 31 22:00:15 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 31 May 2024 22:00:15 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v6] In-Reply-To: References: Message-ID: <25j37lGC_zMZQbRXj2rbtgpWyUcx_9H-BUdE6Zlnuoo=.3a649e2f-53ee-4fa7-addd-883cfea02834@github.com> On Fri, 31 May 2024 15:52:31 GMT, Andrew Haley wrote: >> src/hotspot/share/asm/register.hpp line 273: >> >>> 271: } >>> 272: >>> 273: template >> >> Rx is unused and not needed. Similarly for 3-R overload. > > Isn't it? It seems to me to be used for the next line. I think that comment about unused Rx was on the overload `different_registers(ARS, R)`, which had an Rx parameter in an old version of this change. It's okay now. And there's no longer any 3-R overload. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16617#discussion_r1622966842 From sviswanathan at openjdk.org Fri May 31 22:29:12 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 31 May 2024 22:29:12 GMT Subject: RFR: 8332119: Incorrect IllegalArgumentException for C2 compiled permute kernel In-Reply-To: References: <6IxHpLmCr2e1fKOcbdG38uhJEOsmVUpgVbcGoH4uMnQ=.ac6c99bd-a222-4dbc-a2b2-fdaf1f94a155@github.com> Message-ID: On Fri, 31 May 2024 21:01:35 GMT, Sandhya Viswanathan wrote: >> Currently inline expansion of vector to shuffle conversion simply type casts the vector holding indexes to byte vector[1] where as fallback implementation[2] also wraps the indexes to a valid index range [0, VEC_LEN-1) or generates a -ve index for exceptional / OOB indices. >> >> This patch extends the conversion inline expander to match the fall back implementation. This imposes around 20% performance tax on Vector.toShuffle() intrinsic but fixes this functional bug. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin >> >> PS: Patch also fixes an incorrectness issue reported with [JDK-8332118](https://bugs.openjdk.org/browse/JDK-8332118) >> >> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2352 >> [2] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractShuffle.java#L58 > > src/hotspot/share/opto/vectorIntrinsics.cpp line 2411: > >> 2409: op = wrap_indexes(op, num_elem_to, elem_bt_to); >> 2410: } >> 2411: > > The wrap_indexes is needed only for two vector rearrange. It looks to me that doing a wrap_indexes here at convert would force it for single vector rearrange (or selectFrom) and thereby reduce the performance for that case as well. Please note that the single vector rearrange throws "IndexOutOfBoundsException" and doesn't need to do a wrap. Please ignore the above comment. I verified that each index is partially wrapped as part of toShuffle(). We should name the wrap_indexes() to partially_wrap_indexes() for clarity. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19442#discussion_r1622996294 From kvn at openjdk.org Fri May 31 22:49:01 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 31 May 2024 22:49:01 GMT Subject: RFR: 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer [v2] In-Reply-To: References: Message-ID: On Thu, 30 May 2024 21:53:15 GMT, Axel Boldt-Christmas wrote: >> Oh, and as it doesn't seem to have been clear from my earlier comments: I don't strongly oppose that you fix it this way you do in the RelocIterator, since I have very little interaction with that code. >> >> The comment was more that I would prefer if we take a case-by-case approach when we look at other parts of HotSpot with similar problems and really think what the correct solution would be, and that we don't too quickly start to grab for the `add/sub_to_ptr` solution. Putting these functions in globalDefinitions makes it all too easy to just grab for these functions when we try to solve similar problems, IMHO. That's my 2c. I'm not blocking this patch, as long as we get somewhat decent names. > > My stance is the same as @stefank that I do not oppose this change to fix the immediate issue. > > Looking closer at how the `RelocIterator` is created from a `nmethod` it would never end up with a `nullptr - 1`. Because `relocation_begin()`, which is used to initialize `_current`, would never produce a nullptr. So there is no issue with the other constructor. So plugging the three holes above would remove the ub. (Along with introducing the invariant that you are not allowed to construct from a `CodeSection` with no relocations). > >> But this is different issue for different RFE. > > It may be a different RFE, but it is the same issue (unless I am misunderstanding you are referring to). The `!has_loc()` was specifically introduced to solve this exact ub bug. However it was the wrong property to check. Reading #12854 gives me this impression as well. (Given that the logic around `has_loc` does not seem to have changed since 8153779ad32d1e8ddd37ced826c76c7aafc61894 ) > I agree with @xmas92, and I propose a slight change to his fix: It does not matter. Few lines above we copied relocation info to new buffer and there is assert in `initialize_locs_from()`: assert(this->locs_count() == source_cs->locs_count(), "sanity"); ------------- PR Comment: https://git.openjdk.org/jdk/pull/19424#issuecomment-2143068745 From sviswanathan at openjdk.org Fri May 31 23:52:01 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 31 May 2024 23:52:01 GMT Subject: RFR: 8332119: Incorrect IllegalArgumentException for C2 compiled permute kernel In-Reply-To: <6IxHpLmCr2e1fKOcbdG38uhJEOsmVUpgVbcGoH4uMnQ=.ac6c99bd-a222-4dbc-a2b2-fdaf1f94a155@github.com> References: <6IxHpLmCr2e1fKOcbdG38uhJEOsmVUpgVbcGoH4uMnQ=.ac6c99bd-a222-4dbc-a2b2-fdaf1f94a155@github.com> Message-ID: On Wed, 29 May 2024 06:10:53 GMT, Jatin Bhateja wrote: > Currently inline expansion of vector to shuffle conversion simply type casts the vector holding indexes to byte vector[1] where as fallback implementation[2] also wraps the indexes to a valid index range [0, VEC_LEN-1) or generates a -ve index for exceptional / OOB indices. > > This patch extends the conversion inline expander to match the fall back implementation. This imposes around 20% performance tax on Vector.toShuffle() intrinsic but fixes this functional bug. > > Kindly review and share your feedback. > > Best Regards, > Jatin > > PS: Patch also fixes an incorrectness issue reported with [JDK-8332118](https://bugs.openjdk.org/browse/JDK-8332118) > > [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2352 > [2] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractShuffle.java#L58 Other than these two minor comments, the PR looks good to me. test/hotspot/jtreg/compiler/vectorapi/TestTwoVectorPermute.java line 48: > 46: float expected = Float.NaN; > 47: // Exceptional index. > 48: if (shuf[i] < 0 || shuf[i] >= FSP.length()) { To better match the specs, this could be: if ( (int)shuf[i] < 0 || (int)shuf[i] >= FSP.length()) { ------------- PR Review: https://git.openjdk.org/jdk/pull/19442#pullrequestreview-2091849837 PR Review Comment: https://git.openjdk.org/jdk/pull/19442#discussion_r1623038218