From sviswanathan at openjdk.org Wed May 1 00:17:54 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 1 May 2024 00:17:54 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v9] In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 23:54:19 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > fixes: pp bits in crc32, REX2 branch in ldmxcsr > It looks to me that the source and dest are reversed in the following instruction in call to simd_prefix_and_encode, perhaps that should be a separate PR: // Do we have this wrong src and dst reversed in simd_prefix_and_encode? void Assembler::pextrw(Register dst, XMMRegister src, int imm8) { assert(VM_Version::supports_sse2(), ""); InstructionAttr attributes(AVX_128bit, /* rex_w _/ false, /_ legacy_mode _/ _legacy_mode_bw, /_ no_mask_reg _/ true, /_ uses_vl */ false); int encode = simd_prefix_and_encode(as_XMMRegister(dst->encoding()), xnoreg, src, VEX_SIMD_66, VEX_OPCODE_0F, &attributes); emit_int24((unsigned char)0xC5, (0xC0 | encode), imm8); } Once that PR is fixed, is_src_gpr should be set to true for this one as well. Verified that the pextrw has the operands reversed per the SDM, so please ignore this comment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2087754604 From bkilambi at openjdk.org Wed May 1 08:51:58 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 1 May 2024 08:51:58 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v4] In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 10:15:02 GMT, Emanuel Peter wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Add comments, revert to requires_strict_order and other minor changes > > You probably want to change the name of the PR again: > `Add "is_associative" flag for floating-point add-reduction` -> `8320725: AArch64: C2: Add "requires_strict_order" flag for floating-point add-reduction` Hi @eme64 @theRealAph I have uploaded the latest patch addressing all review comments. Can I please ask for more reviews. Thank you .. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18034#issuecomment-2088168402 From epeter at openjdk.org Wed May 1 08:57:54 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 1 May 2024 08:57:54 GMT Subject: RFR: 8327381: Refactor type-improving transformations in BoolNode::Ideal to BoolNode::Value [v4] In-Reply-To: References: Message-ID: <1u7cg97KwHlBDapxCJCpEwzASwBgx1c2gINa_bHDG0w=.e8bb0c22-b490-406d-89d2-93027ab71277@github.com> On Tue, 30 Apr 2024 21:20:19 GMT, Martin Balao wrote: >> `(x & m) u< m + 1` is false for `m = -1`, right? >> >> Edit: Yep, filed [JDK-8328315](https://bugs.openjdk.org/projects/JDK/issues/JDK-8328315). > >> `(x & m) u< m + 1` is false for `m = -1`, right? >> > > This bug should be handled separately. I'll do that. @martinuy [JDK-8328315](https://bugs.openjdk.org/browse/JDK-8328315) @chhagedorn is already working on that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18198#issuecomment-2088173851 From sgibbons at openjdk.org Wed May 1 14:05:59 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 1 May 2024 14:05:59 GMT Subject: RFR: 8331033: EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 Message-ID: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> Added a strcmp for unsafe_setmemory in process_call_arguments() so the assert would not trigger. I believe this is the correct fix as I do not think the arguments for setMemory need special handling like arraycopy. I would like suggestions on how to generate a testcase to catch this type of error in mainline. ------------- Commit messages: - Add unsafe_setmemory comparison for process_call_arguments() Changes: https://git.openjdk.org/jdk/pull/19032/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19032&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331033 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19032.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19032/head:pull/19032 PR: https://git.openjdk.org/jdk/pull/19032 From dnsimon at openjdk.org Wed May 1 15:08:00 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 1 May 2024 15:08:00 GMT Subject: RFR: 8329982: compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/SimpleDebugInfoTest.java failed assert(oopDesc::is_oop_or_null(val)) failed: bad oop found Message-ID: This PR adds the missing nmethod entry barriers to JVMCI hand assembled tests. It also closes the escape hatch in jvmciCodeInstaller.cpp that allowed JVMCI code to be installed without nmethod entry barriers. ------------- Commit messages: - emit nmethod entry barriers in JVMCI assembler tests Changes: https://git.openjdk.org/jdk/pull/19035/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19035&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8329982 Stats: 302 lines in 6 files changed: 297 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19035.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19035/head:pull/19035 PR: https://git.openjdk.org/jdk/pull/19035 From duke at openjdk.org Wed May 1 17:46:29 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Wed, 1 May 2024 17:46:29 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v10] In-Reply-To: References: Message-ID: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: fix stmxcrs REX2 branch, add asserts to SHA instructions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/01241d48..54d2226f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=08-09 Stats: 8 lines in 1 file changed: 7 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From duke at openjdk.org Wed May 1 17:46:29 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Wed, 1 May 2024 17:46:29 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v9] In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 20:22:10 GMT, Sandhya Viswanathan wrote: > SHA instructions (sha1rnds4, sha1nexte, sha1msg1, sha1msg2, sha256rnds2, sha256msg1, sha256msg2) needs to be encoded using EVEX encoding when egprs are in use. Thank you, I missed these. The APX 3.0 spec says xmm register use is limited to 0-15 for SHA instructions. Coincidentally, the new version 4.0 APX spec. also removes support for EVEX promotion of SHA instructions. Given these specs, I don't think any encoding changes are needed. I've added an assert to these 7 instructions to check that only registers < 16 are used. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2088826920 From duke at openjdk.org Wed May 1 17:46:30 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Wed, 1 May 2024 17:46:30 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v7] In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 22:06:11 GMT, Sandhya Viswanathan wrote: >> Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: >> >> fix 4 more src_is_gpr = true cases, add asserts to check for UseAPX > > src/hotspot/cpu/x86/assembler_x86.cpp line 2632: > >> 2630: prefix(src, true /* is_map1 */); >> 2631: emit_int8((unsigned char)0xAE); >> 2632: emit_operand(as_Register(2), src, 0); > > Even when UseAVX > 0, if the src address uses higher bank registers, ldmxcsr/stmxcsr should be encoded using the REX2 i.e. the else path. Thank you, fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1586556997 From never at openjdk.org Wed May 1 17:46:57 2024 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 1 May 2024 17:46:57 GMT Subject: RFR: 8329982: compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/SimpleDebugInfoTest.java failed assert(oopDesc::is_oop_or_null(val)) failed: bad oop found In-Reply-To: References: Message-ID: On Wed, 1 May 2024 15:03:08 GMT, Doug Simon wrote: > This PR adds the missing nmethod entry barriers to JVMCI hand assembled tests. > It also closes the escape hatch in jvmciCodeInstaller.cpp that allowed JVMCI code to be installed without nmethod entry barriers. src/hotspot/share/jvmci/jvmciCodeInstaller.cpp line 777: > 775: // configurations which generate assembly without being a full compiler. So for now we enforce > 776: // that JIT compiled methods must have an nmethod barrier. > 777: bool install_default = JVMCIENV->get_HotSpotNmethod_isDefault(installed_code) != 0; This line is no longer needed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19035#discussion_r1586558212 From never at openjdk.org Wed May 1 17:52:55 2024 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 1 May 2024 17:52:55 GMT Subject: RFR: 8329982: compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/SimpleDebugInfoTest.java failed assert(oopDesc::is_oop_or_null(val)) failed: bad oop found In-Reply-To: References: Message-ID: On Wed, 1 May 2024 15:03:08 GMT, Doug Simon wrote: > This PR adds the missing nmethod entry barriers to JVMCI hand assembled tests. > It also closes the escape hatch in jvmciCodeInstaller.cpp that allowed JVMCI code to be installed without nmethod entry barriers. In the long term I'm not sure it's worth trying to maintain these assembler tests. The barrier verification code is very weak and on aarch64 it's slightly complicated so we're barely checking that it really matches. I guess this is good enough until we get further problems. I think you can simplify some other logic that deals with the optionality of the barrier. Start with removing JVMCINMethodData::has_entry_barrier and maybe update some of the comments to reflect that it's always emitted. And places that check _nmethod_entry_patch_offset for -1 can be removed or weakened. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19035#issuecomment-2088836198 From imyers at openjdk.org Wed May 1 17:59:02 2024 From: imyers at openjdk.org (Ian Myers) Date: Wed, 1 May 2024 17:59:02 GMT Subject: RFR: 8324756: Remove dependency verification from vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java Message-ID: This change removes dependency verification by passing -XX:-VerifyDependencies in the test. `vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java` takes 20min to run on linux-x86_64-server-fastdebug: time CONF=linux-x86_64-server-fastdebug make test TEST=vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java CONF=linux-x86_64-server-fastdebug make test **1412.82s user 15.27s system 115% cpu 20:41.19 total** Passing -XX:-VerifyDependencies flag speeds up the run time to 1min: time CONF=linux-x86_64-server-fastdebug make test TEST=vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java TEST_VM_OPTS="-XX:-VerifyDependencies" CONF=linux-x86_64-server-fastdebug make test **287.27s user 16.19s system 496% cpu 1:01.10 total** Adding -XX:-VerifyDependencies to the test file accomplishes the same run time of 1min: time CONF=linux-x86_64-server-fastdebug make test TEST=vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java CONF=linux-x86_64-server-fastdebug make test **272.33s user 14.56s system 464% cpu 1:01.75 total** ------------- Commit messages: - Remove dependency verification from vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java Changes: https://git.openjdk.org/jdk/pull/19040/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19040&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8324756 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19040.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19040/head:pull/19040 PR: https://git.openjdk.org/jdk/pull/19040 From kvn at openjdk.org Wed May 1 18:29:01 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 1 May 2024 18:29:01 GMT Subject: RFR: 8331033: EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 In-Reply-To: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> References: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> Message-ID: On Wed, 1 May 2024 14:01:38 GMT, Scott Gibbons wrote: > Added a strcmp for unsafe_setmemory in process_call_arguments() so the assert would not trigger. > > I believe this is the correct fix as I do not think the arguments for setMemory need special handling like arraycopy. > > I would like suggestions on how to generate a testcase to catch this type of error in mainline. `Unsafe.setMemory()` has `checkPrimitivePointer()` call which check that input is a primitive array or some address (`raw` address in EA terms). This check is done before intrinsic is called. Which means your fix is correct. It is similar to other intrinsics which operates on primitive arrays. The test could be locally allocated not-escaped array which is passed to `Unsafe.setMemory()` to be initialized to some value. ------------- PR Review: https://git.openjdk.org/jdk/pull/19032#pullrequestreview-2034177223 From kvn at openjdk.org Wed May 1 18:37:53 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 1 May 2024 18:37:53 GMT Subject: RFR: 8331033: EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 In-Reply-To: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> References: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> Message-ID: On Wed, 1 May 2024 14:01:38 GMT, Scott Gibbons wrote: > Added a strcmp for unsafe_setmemory in process_call_arguments() so the assert would not trigger. > > I believe this is the correct fix as I do not think the arguments for setMemory need special handling like arraycopy. > > I would like suggestions on how to generate a testcase to catch this type of error in mainline. Look on EA tests in `compiler/escapeAnalysis/` which use arraycopy(). Something like `TestMissingAntiDependency.java` or `TestSelfArrayCopy.java` ------------- PR Comment: https://git.openjdk.org/jdk/pull/19032#issuecomment-2088896952 From duke at openjdk.org Wed May 1 19:34:08 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Wed, 1 May 2024 19:34:08 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v11] In-Reply-To: References: Message-ID: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: remove is_map1 comment for addb, andb, movb, orb, testb, xchgb, xorb ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/54d2226f..c65fda0c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=09-10 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From dlong at openjdk.org Wed May 1 20:52:52 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 1 May 2024 20:52:52 GMT Subject: RFR: 8331253: 16 bits is not enough for nmethod::_skipped_instructions_size field In-Reply-To: References: Message-ID: On Wed, 1 May 2024 03:31:41 GMT, Vladimir Kozlov wrote: > In [JDK-8329433](https://bugs.openjdk.org/browse/JDK-8329433) I changed `nmethod::_skipped_instructions_size` field type to `uint16_t` assuming that it only count NOP instructions and GC barriers. I did not take into account that Generational ZGC also incudes barrier stubs into this size (original ZGC missed that). It is correct to include them because these stubs are generated in instructions section and not in stubs section: > > > Statistics for 1330 bytecoded nmethods for C2: > ... > ZGC: > main code = 3237080 (75.567032%) > stubs code = 810577 (25.040375%) > skipped insts = 44432 (1.372595%) > > GenZGC: > main code = 4034704 (78.238518%) > stubs code = 1356703 (33.625839%) > skipped insts = 1074611 (26.634197%) > > > Note, GenZGC has bigger code because it has store barriers. It generates a separate stub for each barrier, no sharing. > > After looking on how `_skipped_instructions_size` is used (only in one place when calculated inlinining size of compiled code) I decided replace it with `int _inline_insts_size;`. It is calculated the same way as before. > > And instead of including instructions stubs into `_skipped_instructions_size` I recorded size of instructions in code section before stubs are generated. This allow to get more accurate size of main instructions and no need for `InlineSkippedInstructionsCounter` in GC barriers stubs. > > I also fixed code in C2 which estimates size of code and stubs sections. > > Tested tier1-4,tier8,stress,xcomp Nice improvement. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19029#pullrequestreview-2034435239 From dnsimon at openjdk.org Wed May 1 20:57:14 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 1 May 2024 20:57:14 GMT Subject: RFR: 8329982: compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/SimpleDebugInfoTest.java failed assert(oopDesc::is_oop_or_null(val)) failed: bad oop found [v2] In-Reply-To: References: Message-ID: > This PR adds the missing nmethod entry barriers to JVMCI hand assembled tests. > It also closes the escape hatch in jvmciCodeInstaller.cpp that allowed JVMCI code to be installed without nmethod entry barriers. Doug Simon has updated the pull request incrementally with two additional commits since the last revision: - remove vestiges of optional JVMCI nmethod support for entry barriers - fixed failing tests and removed tests that install no longer valid code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19035/files - new: https://git.openjdk.org/jdk/pull/19035/files/62b3ad29..be4bf630 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19035&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19035&range=00-01 Stats: 426 lines in 12 files changed: 109 ins; 308 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/19035.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19035/head:pull/19035 PR: https://git.openjdk.org/jdk/pull/19035 From sviswanathan at openjdk.org Wed May 1 21:30:56 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 1 May 2024 21:30:56 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v9] In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 23:54:19 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > fixes: pp bits in crc32, REX2 branch in ldmxcsr src/hotspot/cpu/x86/assembler_x86.cpp line 2621: > 2619: > 2620: void Assembler::ldmxcsr( Address src) { > 2621: if (UseAVX > 0 && !needs_rex2(src.base(), src.index()) ) { When UseAPX is true, it is good to always use the SSE flavor of ldmxcsr/stmxcsr. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1585485391 From sviswanathan at openjdk.org Wed May 1 21:30:58 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 1 May 2024 21:30:58 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v8] In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 21:55:31 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > add egpr support for popcntq(R,A), cvttsd2siq(R,A), popq(R) src/hotspot/cpu/x86/assembler_x86.cpp line 14001: > 13999: emit_int8((unsigned char)0xF3); > 14000: prefixq(src, dst, true /* is_map1 */); > 14001: emit_int8((unsigned char)0xB8); Just a nit, this could be: emit_prefix_and_int8(get_prefixq(src, dst, true /* is_map1 */), (unsigned char) 0xB8); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1586826001 From sviswanathan at openjdk.org Wed May 1 21:30:55 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 1 May 2024 21:30:55 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v11] In-Reply-To: References: Message-ID: On Wed, 1 May 2024 19:34:08 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > remove is_map1 comment for addb, andb, movb, orb, testb, xchgb, xorb Last bit of comments, rest all looks good to me. Thanks a lot for your patience through my review. ------------- PR Review: https://git.openjdk.org/jdk/pull/18476#pullrequestreview-2032456092 From dnsimon at openjdk.org Wed May 1 21:01:53 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 1 May 2024 21:01:53 GMT Subject: RFR: 8329982: compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/SimpleDebugInfoTest.java failed assert(oopDesc::is_oop_or_null(val)) failed: bad oop found In-Reply-To: References: Message-ID: <9kDhf7fvKqxk5uw6xw4CAf1D_nl0fIRztONkbmMf1Q0=.1e9dd45c-3c6f-4d2a-9c21-4ae908e3285e@github.com> On Wed, 1 May 2024 17:49:53 GMT, Tom Rodriguez wrote: > In the long term I'm not sure it's worth trying to maintain these assembler tests. The barrier verification code is very weak and on aarch64 it's slightly complicated so we're barely checking that it really matches. I guess this is good enough until we get further problems. I agree. I had to push more changes now to remove tests that expect to be able to install 0 length code (which obviously fail the nmethod barrier verification). These tests provided stop gap coverage in the early days of JVMCI but now test functionality where breakage will clearly show up in higher layers (such as Graal). What's more, expanding the assembler support in JVMCI is redundant with the fully fledged assembler in Graal. > I think you can simplify some other logic that deals with the optionality of the barrier. Start with removing JVMCINMethodData::has_entry_barrier and maybe update some of the comments to reflect that it's always emitted. And places that check _nmethod_entry_patch_offset for -1 can be removed or weakened. I've done that now. Please let me know if you can spot any other vestiges of the optional support. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19035#issuecomment-2089131172 From kvn at openjdk.org Wed May 1 21:57:54 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 1 May 2024 21:57:54 GMT Subject: RFR: 8331253: 16 bits is not enough for nmethod::_skipped_instructions_size field In-Reply-To: References: Message-ID: <50kxjmEhsEt-y8L942zglgAMRw-F3IuqnDPVfCMc2Ns=.5ceea6e6-3929-4989-afe4-97fd3b0c74c9@github.com> On Wed, 1 May 2024 03:31:41 GMT, Vladimir Kozlov wrote: > In [JDK-8329433](https://bugs.openjdk.org/browse/JDK-8329433) I changed `nmethod::_skipped_instructions_size` field type to `uint16_t` assuming that it only count NOP instructions and GC barriers. I did not take into account that Generational ZGC also incudes barrier stubs into this size (original ZGC missed that). It is correct to include them because these stubs are generated in instructions section and not in stubs section: > > > Statistics for 1330 bytecoded nmethods for C2: > ... > ZGC: > main code = 3237080 (75.567032%) > stubs code = 810577 (25.040375%) > skipped insts = 44432 (1.372595%) > > GenZGC: > main code = 4034704 (78.238518%) > stubs code = 1356703 (33.625839%) > skipped insts = 1074611 (26.634197%) > > > Note, GenZGC has bigger code because it has store barriers. It generates a separate stub for each barrier, no sharing. > > After looking on how `_skipped_instructions_size` is used (only in one place when calculated inlinining size of compiled code) I decided replace it with `int _inline_insts_size;`. It is calculated the same way as before. > > And instead of including instructions stubs into `_skipped_instructions_size` I recorded size of instructions in code section before stubs are generated. This allow to get more accurate size of main instructions and no need for `InlineSkippedInstructionsCounter` in GC barriers stubs. > > I also fixed code in C2 which estimates size of code and stubs sections. > > Tested tier1-4,tier8,stress,xcomp Thank you, Dean. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19029#issuecomment-2089196917 From duke at openjdk.org Thu May 2 00:05:20 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Thu, 2 May 2024 00:05:20 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v12] In-Reply-To: References: Message-ID: <1g7DGTS-7SUhuXFL8NniTGAQSgskv-CdrwtOGHymZqk=.f2ea7538-1ef4-4f94-af4d-972d64e7f699@github.com> > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: simplification and fix asserts in ldmxcsr, stmxcsr, and emit_prefix_and_int8 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/c65fda0c..46eb6b42 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=10-11 Stats: 5 lines in 1 file changed: 1 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From duke at openjdk.org Thu May 2 00:05:20 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Thu, 2 May 2024 00:05:20 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v9] In-Reply-To: References: Message-ID: On Wed, 1 May 2024 00:15:28 GMT, Sandhya Viswanathan wrote: >> Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: >> >> fixes: pp bits in crc32, REX2 branch in ldmxcsr > >> It looks to me that the source and dest are reversed in the following instruction in call to simd_prefix_and_encode, perhaps that should be a separate PR: // Do we have this wrong src and dst reversed in simd_prefix_and_encode? void Assembler::pextrw(Register dst, XMMRegister src, int imm8) { assert(VM_Version::supports_sse2(), ""); InstructionAttr attributes(AVX_128bit, /* rex_w _/ false, /_ legacy_mode _/ _legacy_mode_bw, /_ no_mask_reg _/ true, /_ uses_vl */ false); int encode = simd_prefix_and_encode(as_XMMRegister(dst->encoding()), xnoreg, src, VEX_SIMD_66, VEX_OPCODE_0F, &attributes); emit_int24((unsigned char)0xC5, (0xC0 | encode), imm8); } Once that PR is fixed, is_src_gpr should be set to true for this one as well. > > Verified that the pextrw has the operands reversed per the SDM, so please ignore this comment. @sviswa7 Thank you for your review comments. Very helpful! > src/hotspot/cpu/x86/assembler_x86.cpp line 2621: > >> 2619: >> 2620: void Assembler::ldmxcsr( Address src) { >> 2621: if (UseAVX > 0 && !needs_rex2(src.base(), src.index()) ) { > > When UseAPX is true, it is good to always use the SSE flavor of ldmxcsr/stmxcsr. Thanks, modified assert in these two functions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2089312785 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1586933566 From duke at openjdk.org Thu May 2 00:05:21 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Thu, 2 May 2024 00:05:21 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v4] In-Reply-To: References: <_u8OUYZTsDfl7lzwoee3zewukw-yuFsn1_37Fn7iY5o=.2824d10d-30dd-4314-bae7-0beac0d79e2d@github.com> Message-ID: On Mon, 29 Apr 2024 21:52:14 GMT, Steve Dohrmann wrote: >> src/hotspot/cpu/x86/assembler_x86.cpp line 13260: >> >>> 13258: } else { >>> 13259: emit_int24((prefix & 0xFF00) >> 8, prefix & 0x00FF, b1); >>> 13260: } >> >> We need a check for UseAPX > 0 here. > > @sviswa7, sorry can you clarify what check is needed here. Thanks. Thanks, I understand now. Have added an assert to require UseAPX if prefix is WREX2. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1586933474 From duke at openjdk.org Thu May 2 00:05:21 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Thu, 2 May 2024 00:05:21 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v8] In-Reply-To: References: Message-ID: On Wed, 1 May 2024 21:04:50 GMT, Sandhya Viswanathan wrote: >> Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: >> >> add egpr support for popcntq(R,A), cvttsd2siq(R,A), popq(R) > > src/hotspot/cpu/x86/assembler_x86.cpp line 14001: > >> 13999: emit_int8((unsigned char)0xF3); >> 14000: prefixq(src, dst, true /* is_map1 */); >> 14001: emit_int8((unsigned char)0xB8); > > Just a nit, this could be: > emit_prefix_and_int8(get_prefixq(src, dst, true /* is_map1 */), (unsigned char) 0xB8); Thanks, done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1586933658 From dlong at openjdk.org Thu May 2 01:10:07 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 2 May 2024 01:10:07 GMT Subject: RFR: 8329982: compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/SimpleDebugInfoTest.java failed assert(oopDesc::is_oop_or_null(val)) failed: bad oop found [v2] In-Reply-To: References: Message-ID: On Wed, 1 May 2024 20:57:14 GMT, Doug Simon wrote: >> This PR adds the missing nmethod entry barriers to JVMCI hand assembled tests. >> It also closes the escape hatch in jvmciCodeInstaller.cpp that allowed JVMCI code to be installed without nmethod entry barriers. > > Doug Simon has updated the pull request incrementally with two additional commits since the last revision: > > - remove vestiges of optional JVMCI nmethod support for entry barriers > - fixed failing tests and removed tests that install no longer valid code Wouldn't it be useful for the JVMCI implementation to provide the nmethod entry barrier code? I could be wrong, but I think all the JIT compiler needs to know is how big it is, so it can reserve the space (NOPs would do), then when the code is installed as an nmethod, memcpy it over (if it's static), or use the MacroAssembler if it's not. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19035#issuecomment-2089363560 From szaldana at openjdk.org Thu May 2 01:43:59 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Thu, 2 May 2024 01:43:59 GMT Subject: Integrated: 8331088: Incorrect TraceLoopPredicate output In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 18:11:51 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR addresses [8331088](https://bugs.openjdk.org/browse/JDK-8331088) fixing the incorrect print output. > > Thanks, > Sonia This pull request has now been integrated. Changeset: 19e46eed Author: Sonia Zaldana Calles Committer: Dean Long URL: https://git.openjdk.org/jdk/commit/19e46eed580339a61fd1309c2cc7040e8c83597d Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8331088: Incorrect TraceLoopPredicate output Reviewed-by: chagedorn, dlong ------------- PR: https://git.openjdk.org/jdk/pull/19004 From thartmann at openjdk.org Thu May 2 06:02:14 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 2 May 2024 06:02:14 GMT Subject: RFR: 8331518: Tests should not use the Classpath exception form of the legal header Message-ID: <_3VjI3abxvxKuqUcaQsEsEGQ1WB2MuJlk3yWn7boJxI=.c8012113-6517-434b-9dc3-ab39df449f75@github.com> Removed the Classpath exception from the copyright header of some compiler tests and benchmarks. Thanks, Tobias ------------- Commit messages: - 8331518: Tests should not use the Classpath exception form of the legal header Changes: https://git.openjdk.org/jdk/pull/19047/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19047&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331518 Stats: 15 lines in 5 files changed: 0 ins; 10 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/19047.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19047/head:pull/19047 PR: https://git.openjdk.org/jdk/pull/19047 From rcastanedalo at openjdk.org Thu May 2 06:19:54 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 2 May 2024 06:19:54 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic In-Reply-To: References: Message-ID: <6_WS0VtL8jPuB2U9R8rh8lccVcU_IXMU6AOzaIu48lA=.9934e5e8-0ac8-44c9-9505-3cee953515aa@github.com> On Tue, 30 Apr 2024 21:34:56 GMT, Martin Doerr wrote: >> This changeset generalizes the logic to analyze, declare, and communicate which registers are live at a C2 barrier stub so that it can be used by other collectors than ZGC adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). >> >> The main changes are: >> >> - Make it possible to compute register liveness information before (live-in) or after (live-out) each barrier, and let the collector choose by implementing `BarrierSetC2State::needs_livein_data()`. >> >> - Generalize the interface with which collectors declare which registers must be additionally preserved across barrier runtime calls, adding the methods `BarrierStubC2::preserve(Register r)` and `BarrierStubC2::dont_preserve(Register r)`. >> >> - Simplify the interface with which platform-specific logic computes which registers to preserve across barrier runtime calls, replacing the calls to `BarrierStubC2::result()` and `BarrierStubC2::live()` with a single call to `BarrierStubC2::preserve_set()`. >> >> #### Testing >> >> - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> - tier1-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/4d4e743d8f4cddd5288cee1d69c70fe2b9bea066) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. >> - Build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug). @RealFYang, @TheRealMDoerr: could you please test and review the riscv and ppc changes? Thanks! > > src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 127: > >> 125: while (OptoReg::is_reg(reg)) { >> 126: const VMReg vm_reg = OptoReg::as_VMReg(reg); >> 127: if (!(vm_reg->is_Register()) || vm_reg->as_Register() != r) { > > This doesn't work on PPC64: We run into "assert(is_Register() && is_even(value())) failed: even-aligned GPR name" (vmreg_ppc.hpp:54). Calling `as_Register()` is only supported for the even ones. > Maybe add check `is_concrete()`? Thank you Martin for trying out the patch and for the suggestion, will test a solution based on `is_concrete()` and push the changes later, if it works. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19026#discussion_r1587114886 From roland at openjdk.org Thu May 2 06:58:12 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 2 May 2024 06:58:12 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v15] In-Reply-To: References: <3fcIOnZHYI7ebFLr6vUGnMCo7GDnQ-FTDNjKTeoXqNA=.99b678a0-d04c-4c0d-a269-d0fc41104bfc@github.com> Message-ID: On Thu, 18 Apr 2024 10:16:55 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: >> >> - Merge branch 'master' into JDK-8320649 >> - review >> - test fix >> - test fix >> - Merge branch 'master' into JDK-8320649 >> - whitespaces >> - review >> - Merge branch 'master' into JDK-8320649 >> - review >> - 32 bit build fix >> - ... and 12 more: https://git.openjdk.org/jdk/compare/bfff02ee...a4ffc11e > > test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 2: > >> 1: /* >> 2: * Copyright (c) 2024, Red Hat, Inc. All rights reserved. > > I like the tests, there is a lot of material here. > > A few more ideas: > - have two scoped values, and then have a sequence of `get` and `getValue` calls on them, in some random mix. And check that everything gets commoned, and the result is correct. > - have a method that directly uses `get`, but also has inner scopes of `where`/`get`. Interleave these, maybe even with multiple different scoped values. And nest them with various depths. And then verify both the expected number of calls / loads, as well as the result. > > Also: is it possible to stuff ScopedValues into ScopedValues? That would be another interesting stress-test with lots of options. In the commit that I will push soon, I added more tests: a couple with 3 scoped values and a few with ScopedValues into ScopedValues. The ones you suggest with has inner scopes of `where/get ` can't work because `Cache.invalidate()` would then be called: when C2 sees a call to `Cache.invalidate()` , it doesn't perform any optimization. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1587151951 From thartmann at openjdk.org Thu May 2 07:07:54 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 2 May 2024 07:07:54 GMT Subject: RFR: 8331253: 16 bits is not enough for nmethod::_skipped_instructions_size field In-Reply-To: References: Message-ID: On Wed, 1 May 2024 03:31:41 GMT, Vladimir Kozlov wrote: > In [JDK-8329433](https://bugs.openjdk.org/browse/JDK-8329433) I changed `nmethod::_skipped_instructions_size` field type to `uint16_t` assuming that it only count NOP instructions and GC barriers. I did not take into account that Generational ZGC also incudes barrier stubs into this size (original ZGC missed that). It is correct to include them because these stubs are generated in instructions section and not in stubs section: > > > Statistics for 1330 bytecoded nmethods for C2: > ... > ZGC: > main code = 3237080 (75.567032%) > stubs code = 810577 (25.040375%) > skipped insts = 44432 (1.372595%) > > GenZGC: > main code = 4034704 (78.238518%) > stubs code = 1356703 (33.625839%) > skipped insts = 1074611 (26.634197%) > > > Note, GenZGC has bigger code because it has store barriers. It generates a separate stub for each barrier, no sharing. > > After looking on how `_skipped_instructions_size` is used (only in one place when calculated inlinining size of compiled code) I decided replace it with `int _inline_insts_size;`. It is calculated the same way as before. > > And instead of including instructions stubs into `_skipped_instructions_size` I recorded size of instructions in code section before stubs are generated. This allow to get more accurate size of main instructions and no need for `InlineSkippedInstructionsCounter` in GC barriers stubs. > > I also fixed code in C2 which estimates size of code and stubs sections. > > Tested tier1-4,tier8,stress,xcomp Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19029#pullrequestreview-2034982296 From roland at openjdk.org Thu May 2 07:10:59 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 2 May 2024 07:10:59 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v15] In-Reply-To: <6-Yl6oBb-GdMyxY9DdqLcJbkZGxjItUvr2xHF3rFYk0=.2fd21903-990c-4d60-9ff8-0606506ba86d@github.com> References: <3fcIOnZHYI7ebFLr6vUGnMCo7GDnQ-FTDNjKTeoXqNA=.99b678a0-d04c-4c0d-a269-d0fc41104bfc@github.com> <6-Yl6oBb-GdMyxY9DdqLcJbkZGxjItUvr2xHF3rFYk0=.2fd21903-990c-4d60-9ff8-0606506ba86d@github.com> Message-ID: On Thu, 18 Apr 2024 12:22:27 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/loopopts.cpp line 3783: >> >>> 3781: // ScopedValueGetLoadFromCache and companion ScopedValueGetHitsInCacheNode must stay together >>> 3782: move_scoped_value_nodes_to_not_peel(peel, not_peel, peel_list, sink_list, i); >>> 3783: incr = false; >> >> Do we not have to increment the `cloned_for_outside_use`, which affects the `estimate`? > > Could we otherwise exhaust the node limit, by peeling a loop that is too large? No node is cloned here so there's no need to adjust the `estimate`. What happens is that a `ScopedValueGetHitsInCacheNode` is in the peeled region of the loop but not its `ScopedValueGetLoadFromCache` because peeling happens right above the `If` for the `ScopedValueGetHitsInCacheNode` . It's correct to simply move the `ScopedValueGetHitsInCacheNode` out of the peeled region into the non peeled region because it's only used there. There was no test case for this. I added one. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1587164252 From roland at openjdk.org Thu May 2 07:18:05 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 2 May 2024 07:18:05 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v15] In-Reply-To: <6-Yl6oBb-GdMyxY9DdqLcJbkZGxjItUvr2xHF3rFYk0=.2fd21903-990c-4d60-9ff8-0606506ba86d@github.com> References: <3fcIOnZHYI7ebFLr6vUGnMCo7GDnQ-FTDNjKTeoXqNA=.99b678a0-d04c-4c0d-a269-d0fc41104bfc@github.com> <6-Yl6oBb-GdMyxY9DdqLcJbkZGxjItUvr2xHF3rFYk0=.2fd21903-990c-4d60-9ff8-0606506ba86d@github.com> Message-ID: On Thu, 18 Apr 2024 11:45:07 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/loopopts.cpp line 4010: >> >>> 4008: peel.remove(hits_in_cache->_idx); >>> 4009: not_peel.set(hits_in_cache->_idx); >>> 4010: peel_list.remove(i); >> >> Looks like duplicated code from the call-site. A refactoring may help. > > I think you could combine the code with the case: > `if (n->in(0) == nullptr && !n->is_Load() && !n->is_CMove()) {` > And then you would have this code here, as well as the `TracePartialPeeling` code shared for both. I moved that code to a helper method so it's shared. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1587169416 From stuefe at openjdk.org Thu May 2 07:18:12 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 2 May 2024 07:18:12 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v7] In-Reply-To: References: Message-ID: > See [1] for previous discussions. > > We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. > > The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. > > Examples: > > This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` > > This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` > > > --- > > The patch: > > 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. > 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. > 3) Adapted and extended tests > > I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. > > > Tested: > > - manually on Mac m1 (debug and release) > - GHAs are running > - but Oracle will do more testing before this goes in > > [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Remove accidental change to TestDeadPhiMergeMemLoop.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18969/files - new: https://git.openjdk.org/jdk/pull/18969/files/5a460a1f..691a1467 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18969&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18969&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18969/head:pull/18969 PR: https://git.openjdk.org/jdk/pull/18969 From roland at openjdk.org Thu May 2 07:18:04 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 2 May 2024 07:18:04 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v15] In-Reply-To: References: <3fcIOnZHYI7ebFLr6vUGnMCo7GDnQ-FTDNjKTeoXqNA=.99b678a0-d04c-4c0d-a269-d0fc41104bfc@github.com> Message-ID: On Thu, 18 Apr 2024 12:47:54 GMT, Emanuel Peter wrote: > I am wondering if it would make sense to have some `scoped_value.hpp/cpp`, where you can put all your new classes. This would also allow you to put documentation about the general approach at the top of the `scoped_value.hpp` file. Currently, the code is spread all over, and it would be hard to know where one could find a good summary of the whole optimization. I moved most of the scoped value specific code to `scoped_value.hpp/cpp` in the new commit. > src/hotspot/share/opto/loopnode.hpp line 703: > >> 701: bool policy_peeling(PhaseIdealLoop* phase, bool scoped_value_only); >> 702: >> 703: uint estimate_peeling(PhaseIdealLoop* phase, bool peel_only_if_has_scoped_value); > > Can we use the same name for `scoped_value_only` and `peel_only_if_has_scoped_value`? In `policy_peeling` you pass the value into `estimate_peeling`, so it seems to be the same. > > Somehow it does not sit well with me that we have such a special-case flag in such a high-level and general method. But I don't know a fix now. It just looks like not the best design. But that may not be your fault. Are there any alternatives? I added a `policy_peeling_for_scoped_value()` method. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2089774427 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1587170022 From roland at openjdk.org Thu May 2 07:20:59 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 2 May 2024 07:20:59 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v16] In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 13:09:22 GMT, Roland Westrelin wrote: >> This change implements C2 optimizations for calls to >> ScopedValue.get(). Indeed, in: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> `v2` can be replaced by `v1` and the second call to `get()` can be >> optimized out. That's true whatever is between the 2 calls unless a >> new mapping for `scopedValue` is created in between (when that happens >> no optimizations is performed for the method being compiled). Hoisting >> a `get()` call out of loop for a loop invariant `scopedValue` should >> also be legal in most cases. >> >> `ScopedValue.get()` is implemented in java code as a 2 step process. A >> cache is attached to the current thread object. If the `ScopedValue` >> object is in the cache then the result from `get()` is read from >> there. Otherwise a slow call is performed that also inserts the >> mapping in the cache. The cache itself is lazily allocated. One >> `ScopedValue` can be hashed to 2 different indexes in the cache. On a >> cache probe, both indexes are checked. As a consequence, the process >> of probing the cache is a multi step process (check if the cache is >> present, check first index, check second index if first index >> failed). If the cache is populated early on, then when the method that >> calls `ScopedValue.get()` is compiled, profile reports the slow path >> as never taken and only the read from the cache is compiled. >> >> To perform the optimizations, I added 3 new node types to C2: >> >> - the pair >> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for >> the cache probe >> >> - a cfg node ScopedValueGetResultNode to help locate the result of the >> `get()` call in the IR graph. >> >> In pseudo code, once the nodes are inserted, the code of a `get()` is: >> >> >> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) >> if (hits_in_the_cache) { >> res = ScopedValueGetLoadFromCache(hits_in_the_cache); >> } else { >> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex >> } >> res = ScopedValueGetResult(res) >> >> >> In the snippet: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> Replacing `v2` by `v1` is then done by starting from the >> `ScopedValueGetResult` node for the second `get()` and looking for a >> dominating `ScopedValueGetResult` for the same `ScopedValue` >> object. When one is found, it is used as a replacement. Eliminating >> the second `get()` call is achieved by making >> `ScopedValueGetHitsInCache` always successful if there's a dominating >> `Scoped... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Emanuel Peter I also removed `Node::find_unique_out_with()` and replaced it with `Node* find_out_with(int opcode, bool want_unique = false)`. I'll look into the automatic casting but I'd like to possibly do it as a separate clean up. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2089779578 From roland at openjdk.org Thu May 2 07:29:44 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 2 May 2024 07:29:44 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v17] In-Reply-To: References: Message-ID: > This change implements C2 optimizations for calls to > ScopedValue.get(). Indeed, in: > > > v1 = scopedValue.get(); > ... > v2 = scopedValue.get(); > > > `v2` can be replaced by `v1` and the second call to `get()` can be > optimized out. That's true whatever is between the 2 calls unless a > new mapping for `scopedValue` is created in between (when that happens > no optimizations is performed for the method being compiled). Hoisting > a `get()` call out of loop for a loop invariant `scopedValue` should > also be legal in most cases. > > `ScopedValue.get()` is implemented in java code as a 2 step process. A > cache is attached to the current thread object. If the `ScopedValue` > object is in the cache then the result from `get()` is read from > there. Otherwise a slow call is performed that also inserts the > mapping in the cache. The cache itself is lazily allocated. One > `ScopedValue` can be hashed to 2 different indexes in the cache. On a > cache probe, both indexes are checked. As a consequence, the process > of probing the cache is a multi step process (check if the cache is > present, check first index, check second index if first index > failed). If the cache is populated early on, then when the method that > calls `ScopedValue.get()` is compiled, profile reports the slow path > as never taken and only the read from the cache is compiled. > > To perform the optimizations, I added 3 new node types to C2: > > - the pair > ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for > the cache probe > > - a cfg node ScopedValueGetResultNode to help locate the result of the > `get()` call in the IR graph. > > In pseudo code, once the nodes are inserted, the code of a `get()` is: > > > hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) > if (hits_in_the_cache) { > res = ScopedValueGetLoadFromCache(hits_in_the_cache); > } else { > res = ..; //slow call possibly inlined. Subgraph can be arbitray complex > } > res = ScopedValueGetResult(res) > > > In the snippet: > > > v1 = scopedValue.get(); > ... > v2 = scopedValue.get(); > > > Replacing `v2` by `v1` is then done by starting from the > `ScopedValueGetResult` node for the second `get()` and looking for a > dominating `ScopedValueGetResult` for the same `ScopedValue` > object. When one is found, it is used as a replacement. Eliminating > the second `get()` call is achieved by making > `ScopedValueGetHitsInCache` always successful if there's a dominating > `ScopedValueGetResult` and replacing its companion > `ScopedValueGetLoadFromCache` by the dominating > `ScopedValueGetResult`. > > Hoisting a `g... Roland Westrelin has updated the pull request incrementally with four additional commits since the last revision: - more - more tests - scoped_value.[ch]pp - review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16966/files - new: https://git.openjdk.org/jdk/pull/16966/files/f63bf543..d38872fd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16966&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16966&range=15-16 Stats: 5196 lines in 28 files changed: 2735 ins; 2322 del; 139 mod Patch: https://git.openjdk.org/jdk/pull/16966.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16966/head:pull/16966 PR: https://git.openjdk.org/jdk/pull/16966 From roland at openjdk.org Thu May 2 07:31:58 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 2 May 2024 07:31:58 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v13] In-Reply-To: References: <9Eoh8hOSSVvAtf9iVQ6hflQyceUtt4dpZdqm61zg5XI=.358a4d79-70d9-4b54-85d5-37c6817f0fae@github.com> Message-ID: On Mon, 29 Apr 2024 23:02:33 GMT, Dean Long wrote: >> src/hotspot/share/c1/c1_GraphBuilder.cpp line 2030: >> >>> 2028: receiver = state()->stack_at(index); >>> 2029: ciType* type = receiver->exact_type(); >>> 2030: if (type != nullptr && type->is_loaded()) { >> >> Is it the case that we can't see an interface here? Or that we think it's ok if we see an interface here? > > We can't see an interface here because it will get rejected by `ciInstanceKlass::exact_klass`, so we could even assert for that here if we wanted. Then, I think we should add an assert that `!type->as_instance_klass()->is_interface()` and also that it's not and array of interfaces (using `base_element_klass()`) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1587185774 From tholenstein at openjdk.org Thu May 2 07:37:55 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 2 May 2024 07:37:55 GMT Subject: RFR: 8331404: IGV: Show line numbers for callees in properties In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 15:38:20 GMT, Christian Hagedorn wrote: > IGV shows the `bci` for a node in the callee, followed by the bci in the caller method and so on until we reach the root method. For the `line` property, we currently only show the line number found in the root method (`first()` is the root method being compiled and `second()` and `third()` are inlined): > > Example program: > ![image](https://github.com/openjdk/jdk/assets/17833009/579fe9eb-4bd8-42d8-9d03-875f25bd97ae) > > Properties of the store to `fFld`: > ![image](https://github.com/openjdk/jdk/assets/17833009/3763cccf-c1ba-4d7f-a986-eae8bf0654b0) > > One could read the line number from the `jvms` property above. But you would need to expand that property with the button on the right side which opens a window. But then you cannot click anything else anymore in IGV until you close the window again. > > A simpler and easier to read solution is to add the line number information to match the bci numbers (they are printed in callee->root method order which I think is okay - especially if there are a lot of inlinees, it could be easier to have the really interesting numbers at the start on the left side). This would look something like that: > ![image](https://github.com/openjdk/jdk/assets/17833009/fcab3af6-69ac-43ae-89be-19fc4476d12f) > > If there is no line number information for a bci, I simply emit a `_`. > > Testing: > - Manual testing in IGV > - Sanity testing by running `java -Xcomp -XX:+PrintIdealGraph -XX:PrintIdealGraphLevel=4 -XX:PrintIdealGraphFile=graph.xml HelloWorld.java`. > > Thanks, > Christian Looks good! ------------- Marked as reviewed by tholenstein (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19025#pullrequestreview-2035030650 From dholmes at openjdk.org Thu May 2 07:44:55 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 2 May 2024 07:44:55 GMT Subject: RFR: 8331518: Tests should not use the "Classpath" exception form of the legal header In-Reply-To: <_3VjI3abxvxKuqUcaQsEsEGQ1WB2MuJlk3yWn7boJxI=.c8012113-6517-434b-9dc3-ab39df449f75@github.com> References: <_3VjI3abxvxKuqUcaQsEsEGQ1WB2MuJlk3yWn7boJxI=.c8012113-6517-434b-9dc3-ab39df449f75@github.com> Message-ID: On Thu, 2 May 2024 05:57:50 GMT, Tobias Hartmann wrote: > Removed the Classpath exception from the copyright header of some compiler tests and benchmarks. > > Thanks, > Tobias LGTM. Thanks. I'd consider this a trivial fix too. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19047#pullrequestreview-2035042618 From thartmann at openjdk.org Thu May 2 07:51:55 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 2 May 2024 07:51:55 GMT Subject: RFR: 8331518: Tests should not use the "Classpath" exception form of the legal header In-Reply-To: <_3VjI3abxvxKuqUcaQsEsEGQ1WB2MuJlk3yWn7boJxI=.c8012113-6517-434b-9dc3-ab39df449f75@github.com> References: <_3VjI3abxvxKuqUcaQsEsEGQ1WB2MuJlk3yWn7boJxI=.c8012113-6517-434b-9dc3-ab39df449f75@github.com> Message-ID: On Thu, 2 May 2024 05:57:50 GMT, Tobias Hartmann wrote: > Removed the Classpath exception from the copyright header of some compiler tests and benchmarks. > > Thanks, > Tobias Thanks for the review David! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19047#issuecomment-2089828156 From thartmann at openjdk.org Thu May 2 07:51:55 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 2 May 2024 07:51:55 GMT Subject: Integrated: 8331518: Tests should not use the "Classpath" exception form of the legal header In-Reply-To: <_3VjI3abxvxKuqUcaQsEsEGQ1WB2MuJlk3yWn7boJxI=.c8012113-6517-434b-9dc3-ab39df449f75@github.com> References: <_3VjI3abxvxKuqUcaQsEsEGQ1WB2MuJlk3yWn7boJxI=.c8012113-6517-434b-9dc3-ab39df449f75@github.com> Message-ID: <5mrHBQlLsVmlnl8hMSirNOfnBy71QLlF7ajun-SCbFU=.ba0e51c2-4643-4cbc-bf13-cbd9d3a8a2e3@github.com> On Thu, 2 May 2024 05:57:50 GMT, Tobias Hartmann wrote: > Removed the Classpath exception from the copyright header of some compiler tests and benchmarks. > > Thanks, > Tobias This pull request has now been integrated. Changeset: d3bf5262 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/d3bf52628efb79e1b98749d628c4b6d035e1d511 Stats: 15 lines in 5 files changed: 0 ins; 10 del; 5 mod 8331518: Tests should not use the "Classpath" exception form of the legal header Reviewed-by: dholmes ------------- PR: https://git.openjdk.org/jdk/pull/19047 From rcastanedalo at openjdk.org Thu May 2 07:57:18 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 2 May 2024 07:57:18 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic [v2] In-Reply-To: References: Message-ID: > This changeset generalizes the logic to analyze, declare, and communicate which registers are live at a C2 barrier stub so that it can be used by other collectors than ZGC adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). > > The main changes are: > > - Make it possible to compute register liveness information before (live-in) or after (live-out) each barrier, and let the collector choose by implementing `BarrierSetC2State::needs_livein_data()`. > > - Generalize the interface with which collectors declare which registers must be additionally preserved across barrier runtime calls, adding the methods `BarrierStubC2::preserve(Register r)` and `BarrierStubC2::dont_preserve(Register r)`. > > - Simplify the interface with which platform-specific logic computes which registers to preserve across barrier runtime calls, replacing the calls to `BarrierStubC2::result()` and `BarrierStubC2::live()` with a single call to `BarrierStubC2::preserve_set()`. > > #### Testing > > - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). > - tier1-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/4d4e743d8f4cddd5288cee1d69c70fe2b9bea066) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. > - Build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug). @RealFYang, @TheRealMDoerr: could you please test and review the riscv and ppc changes? Thanks! Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Use VMReg::is_concrete for testing sub-registers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19026/files - new: https://git.openjdk.org/jdk/pull/19026/files/31a19a48..c0fc66de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19026&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19026&range=00-01 Stats: 13 lines in 1 file changed: 0 ins; 6 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/19026.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19026/head:pull/19026 PR: https://git.openjdk.org/jdk/pull/19026 From rcastanedalo at openjdk.org Thu May 2 07:57:18 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 2 May 2024 07:57:18 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic [v2] In-Reply-To: References: Message-ID: <3mTCG75Z1f2KDjDAIhnkxKDwbEK2Q4LvF5T5tJ0vWBQ=.274b69df-ef47-4750-a916-0d25ec8b65fa@github.com> On Tue, 30 Apr 2024 21:34:56 GMT, Martin Doerr wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Use VMReg::is_concrete for testing sub-registers > > src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 127: > >> 125: while (OptoReg::is_reg(reg)) { >> 126: const VMReg vm_reg = OptoReg::as_VMReg(reg); >> 127: if (!(vm_reg->is_Register()) || vm_reg->as_Register() != r) { > > This doesn't work on PPC64: We run into "assert(is_Register() && is_even(value())) failed: even-aligned GPR name" (vmreg_ppc.hpp:54). Calling `as_Register()` is only supported for the even ones. > Maybe add check `is_concrete()`? Done (commit https://github.com/openjdk/jdk/pull/19026/commits/c0fc66deb654a9b930a7b7cf1a7e7fa093739027). @TheRealMDoerr please let me know if this works on PPC64. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19026#discussion_r1587216311 From aph at openjdk.org Thu May 2 08:14:59 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 2 May 2024 08:14:59 GMT Subject: RFR: 8329258: TailCall should not use frame pointer register for jump target [v4] In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 10:31:09 GMT, Tobias Hartmann wrote: >> Applying @danielogh's patch (see description of [JDK-8329258](https://bugs.openjdk.org/browse/JDK-8329258)) to enable `StressGCM` / `StressLCM` for stub compilations triggers a crash on AArch64. The problem is that the register allocator uses `R29` (`rfp`) which is usually used for the frame pointer to hold the `TailCall` `exc_target` when generating the `_slow_arraycopy_Java` stub: >> https://github.com/openjdk/jdk/blob/6dfb8120c270a76fcba5a5c3c9ad91da3282d5fa/src/hotspot/share/opto/generateOptoStub.cpp#L258-L264 >> >> With `StressGCM` / `StressLCM` the register initialization is scheduled early and `R29` is corrupted by the `MachEpilogNode` which is opaque to the register allocator and inserted right before the `TailCall`: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/share/opto/output.cpp#L320-L326 >> >> >> 028 mov R29, 0x0000ffff78cc0080 # ptr >> >> [...] >> >> 098 # pop frame 16 >> ldp lr, rfp, [sp,#0] <- Epilog kills rfp (and lr + sp) >> add sp, sp, #16 >> >> [...] >> >> 0a0 br R29 # R12 holds method >> >> >> As a result, we jump to a "garbage" location. See [bad code](https://bugs.openjdk.org/secure/attachment/108835/BAD_slow_arraycopy_Java.txt) vs. [good code](https://bugs.openjdk.org/secure/attachment/108834/GOOD_slow_arraycopy_Java.txt). >> >> On x86, we use `no_rbp_RegP` instead: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L2564-L2566 https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L12470 >> >> I implemented the same on AArch64. >> >> I think other platforms are affected as well but I don't have the hardware to test there. @offamitkumar (S390), @TheRealMDoerr (PPC), @RealFYang (RISC-V), @bulasevich (ARM32), could you please have a look? >> >> I also wondered if `R29` shouldn't be a callee-save (SOE) register in the C calling convention? >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/aarch64/aarch64.ad#L139-L140 On x86_64, `RBP` is SOE: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L86-L87 >> >> `TestTailCallInArrayCopyStub.java` will only work once [JDK-8330016](https://bugs.openjdk.org/browse/JDK-8330016) is integrated. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Comment adjustment Thanks. Sorry for my slow reply. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18716#pullrequestreview-2035101941 From chagedorn at openjdk.org Thu May 2 08:15:12 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 2 May 2024 08:15:12 GMT Subject: RFR: 8305638: Renaming and small clean-ups around predicates [v6] In-Reply-To: References: Message-ID: > **Update: April 22** > > After splitting off and integrating the following PRs from this PR: > https://github.com/openjdk/jdk/pull/18080 > https://github.com/openjdk/jdk/pull/18293 > https://github.com/openjdk/jdk/pull/18628 > https://github.com/openjdk/jdk/pull/18723 > > we are only left with a few renaming and clean-ups from this PR. Directly merging the master branch in was quite hard. I therefore reverted all commits to get back to a clean master and then applied all remaining code changes manually (required a force push). > >
>
> > _------------ Original PR description --------------_ > > This patch is intended for JDK 23. > > While preparing the patch for the full fix for Assertion Predicates [JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981), I still noticed that some changes are not required for the actual fix and could be split off and reviewed separately in this PR. > > The patch applies the following cleanup changes: > - The complete fix had to add slightly different cloning cases in `PhaseIdealLoop::create_bool_from_template_assertion_predicate()` which already has quite some logic to switch between different cases. Additionally, the algorithm in the method itself was already hard to understand and difficult to adapt. I therefore re-implemented it in a separate class `CloneTemplateAssertionPredicateBool` together with some helper classes like `DFSNodeStack`. To use it, I've added a `TemplateAssertionPredicateBool` class that offers three cloning possibilities: > - `clone()`: Clone without modification > - `clone_and_replace_opaque_loop_nodes()`: Clone and replace the `OpaqueLoop*Nodes` with a new init and stride node. > - `clone_and_replace_init()`: Special case of `clone_and_replace_opaque_loop_nodes()` which only replaces `OpaqueLoopInitNode` and clones `OpaqueLoopStrideNode`. > > This refactoring could be extracted from the complete fix. > - The Split If code to detect (`subgraph_has_opaque()`) and clone Template Assertion Predicate Bools was extracted to a separate class `CloneTemplateAssertionPredicateBoolDown` and uses the new `TemplateAssertionPredicateBool` class to do the actual cloning. > - In the process of coding the complete fix, I've refactored the Loop Unswitching code quite a bit. This change could also be extracted into a separate RFE. Changes include: > - Renaming > - Extracting code to separate classes/methods > - Adding comments > - Some small refactoring including: > - Removing unused parameters > - Renaming variables/parameters/methods > > Th... Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Merge branch 'refs/heads/master' into JDK-8305638 # Conflicts: # src/hotspot/share/opto/loopPredicate.cpp - Fix useful Template Assertion Predicate marking - Fix useful Parse Predicate marking - Remaining renaming and small clean-ups ------------- Changes: https://git.openjdk.org/jdk/pull/16877/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16877&range=05 Stats: 77 lines in 5 files changed: 17 ins; 7 del; 53 mod Patch: https://git.openjdk.org/jdk/pull/16877.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16877/head:pull/16877 PR: https://git.openjdk.org/jdk/pull/16877 From chagedorn at openjdk.org Thu May 2 08:16:53 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 2 May 2024 08:16:53 GMT Subject: RFR: 8331404: IGV: Show line numbers for callees in properties In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 15:38:20 GMT, Christian Hagedorn wrote: > IGV shows the `bci` for a node in the callee, followed by the bci in the caller method and so on until we reach the root method. For the `line` property, we currently only show the line number found in the root method (`first()` is the root method being compiled and `second()` and `third()` are inlined): > > Example program: > ![image](https://github.com/openjdk/jdk/assets/17833009/579fe9eb-4bd8-42d8-9d03-875f25bd97ae) > > Properties of the store to `fFld`: > ![image](https://github.com/openjdk/jdk/assets/17833009/3763cccf-c1ba-4d7f-a986-eae8bf0654b0) > > One could read the line number from the `jvms` property above. But you would need to expand that property with the button on the right side which opens a window. But then you cannot click anything else anymore in IGV until you close the window again. > > A simpler and easier to read solution is to add the line number information to match the bci numbers (they are printed in callee->root method order which I think is okay - especially if there are a lot of inlinees, it could be easier to have the really interesting numbers at the start on the left side). This would look something like that: > ![image](https://github.com/openjdk/jdk/assets/17833009/fcab3af6-69ac-43ae-89be-19fc4476d12f) > > If there is no line number information for a bci, I simply emit a `_`. > > Testing: > - Manual testing in IGV > - Sanity testing by running `java -Xcomp -XX:+PrintIdealGraph -XX:PrintIdealGraphLevel=4 -XX:PrintIdealGraphFile=graph.xml HelloWorld.java`. > > Thanks, > Christian Thanks Toby for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19025#issuecomment-2089867892 From bulasevich at openjdk.org Thu May 2 08:19:53 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 2 May 2024 08:19:53 GMT Subject: RFR: 8330806: test/hotspot/jtreg/compiler/c1/TestLargeMonitorOffset.java fails on ARM32 In-Reply-To: <0TiKLBlllAunug0vnrED5etz2Asg0faInPkxw2qebE8=.327bf508-f675-4b1a-8d65-866cae772234@github.com> References: <0TiKLBlllAunug0vnrED5etz2Asg0faInPkxw2qebE8=.327bf508-f675-4b1a-8d65-866cae772234@github.com> Message-ID: <7MFw7690WXwQ0vF53EPK04vMLVavhkIfTtdGHvk3gcI=.27b7b975-ce50-4c92-bcb7-7e4ae189e293@github.com> On Fri, 26 Apr 2024 15:22:25 GMT, Sergey Nazarkin wrote: >> TestLargeMonitorOffset was introduced by 8310844 with a fix for the AArch64 platform. The same issue needs to be fixed for ARM32. With this change, we add the large slot_offset handling to the ARM32 version of IR_Assembler::osr_entry(). >> >> Testing: jtreg hotspot, jtreg jdk tier1-3. > > src/hotspot/cpu/arm/c1_LIRAssembler_arm.cpp line 156: > >> 154: int slot_offset = monitor_offset - (i * 2 * BytesPerWord); >> 155: if (slot_offset >= 4096 - BytesPerWord) { >> 156: __ add_slow(R2, OSR_buf, slot_offset); > > Can't we check this once before the loop? Or does such an optimization make no sense? Hi Sergey. Thanks for looking at this. This is not performance critical code, and the typical number_of_locks value is 0, so IF inside the FOR loop makes sense here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18891#discussion_r1587242241 From dlong at openjdk.org Thu May 2 08:26:59 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 2 May 2024 08:26:59 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v13] In-Reply-To: References: <9Eoh8hOSSVvAtf9iVQ6hflQyceUtt4dpZdqm61zg5XI=.358a4d79-70d9-4b54-85d5-37c6817f0fae@github.com> Message-ID: <_x-OSownzQQZ8fmlsbvQ42MLf9BGZskECTNncOE0s4E=.8381a076-0cc4-4339-924f-fa22ca780573@github.com> On Thu, 2 May 2024 07:29:04 GMT, Roland Westrelin wrote: >> We can't see an interface here because it will get rejected by `ciInstanceKlass::exact_klass`, so we could even assert for that here if we wanted. > > Then, I think we should add an assert that `!type->as_instance_klass()->is_interface()` and also that it's not and array of interfaces (using `base_element_klass()`) An array of interfaces can be exact: new Interface[20].getClasss(); and it seems like it would be safe to allow this, so I think we only need one assert for `!type->as_instance_klass()->is_interface()` if we don't trust the result of exact_type(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1587252944 From dnsimon at openjdk.org Thu May 2 09:34:51 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 2 May 2024 09:34:51 GMT Subject: RFR: 8329982: compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/SimpleDebugInfoTest.java failed assert(oopDesc::is_oop_or_null(val)) failed: bad oop found [v2] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 01:06:48 GMT, Dean Long wrote: > Wouldn't it be useful for the JVMCI implementation to provide the nmethod entry barrier code? I could be wrong, but I think all the JIT compiler needs to know is how big it is, so it can reserve the space (NOPs would do), then when the code is installed as an nmethod, memcpy it over (if it's static), or use the MacroAssembler if it's not. That's an interesting idea and would be great if possible. However, given that Graal [puts the slow path out-of-line](https://github.com/oracle/graal/blob/c0b79318e2158a22bec5a9a991ee6ee226de6492/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/hotspot/amd64/AMD64HotSpotBackend.java#L195), we'd be stuck with the problem of patching in the jump target. Also, JVMCI would have to conservatively emit a long-form jump instruction to the slow path. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19035#issuecomment-2090014082 From lucy at openjdk.org Thu May 2 09:42:54 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Thu, 2 May 2024 09:42:54 GMT Subject: RFR: 8331421: ubsan: vmreg.cpp checking error member call on misaligned address In-Reply-To: <-0CT3e78TSiBvMrwImLSJDFJkQ7k7BwcMhfoW5tKklA=.aab953cb-9fb3-4b89-acf4-ae6967276c0b@github.com> References: <-0CT3e78TSiBvMrwImLSJDFJkQ7k7BwcMhfoW5tKklA=.aab953cb-9fb3-4b89-acf4-ae6967276c0b@github.com> Message-ID: On Tue, 30 Apr 2024 13:56:07 GMT, Martin Doerr wrote: > As shown in the JBS issue, the Undefined Behavior Sanitizer complains about `VMRegImpl::stack_0()->value()`. This can easily be avoided by skipping the more complicated way which includes addition and subtraction of `first()`. LGTM. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19022#pullrequestreview-2035282281 From mdoerr at openjdk.org Thu May 2 09:42:55 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 2 May 2024 09:42:55 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic [v2] In-Reply-To: <3mTCG75Z1f2KDjDAIhnkxKDwbEK2Q4LvF5T5tJ0vWBQ=.274b69df-ef47-4750-a916-0d25ec8b65fa@github.com> References: <3mTCG75Z1f2KDjDAIhnkxKDwbEK2Q4LvF5T5tJ0vWBQ=.274b69df-ef47-4750-a916-0d25ec8b65fa@github.com> Message-ID: On Thu, 2 May 2024 07:54:16 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 127: >> >>> 125: while (OptoReg::is_reg(reg)) { >>> 126: const VMReg vm_reg = OptoReg::as_VMReg(reg); >>> 127: if (!(vm_reg->is_Register()) || vm_reg->as_Register() != r) { >> >> This doesn't work on PPC64: We run into "assert(is_Register() && is_even(value())) failed: even-aligned GPR name" (vmreg_ppc.hpp:54). Calling `as_Register()` is only supported for the even ones. >> Maybe add check `is_concrete()`? > > Done (commit https://github.com/openjdk/jdk/pull/19026/commits/c0fc66deb654a9b930a7b7cf1a7e7fa093739027). @TheRealMDoerr please let me know if this works on PPC64. Yes, this works on PPC64. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19026#discussion_r1587348967 From mdoerr at openjdk.org Thu May 2 10:23:57 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 2 May 2024 10:23:57 GMT Subject: RFR: 8331421: ubsan: vmreg.cpp checking error member call on misaligned address In-Reply-To: <-0CT3e78TSiBvMrwImLSJDFJkQ7k7BwcMhfoW5tKklA=.aab953cb-9fb3-4b89-acf4-ae6967276c0b@github.com> References: <-0CT3e78TSiBvMrwImLSJDFJkQ7k7BwcMhfoW5tKklA=.aab953cb-9fb3-4b89-acf4-ae6967276c0b@github.com> Message-ID: On Tue, 30 Apr 2024 13:56:07 GMT, Martin Doerr wrote: > As shown in the JBS issue, the Undefined Behavior Sanitizer complains about `VMRegImpl::stack_0()->value()`. This can easily be avoided by skipping the more complicated way which includes addition and subtraction of `first()`. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19022#issuecomment-2090106095 From mdoerr at openjdk.org Thu May 2 10:23:58 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 2 May 2024 10:23:58 GMT Subject: Integrated: 8331421: ubsan: vmreg.cpp checking error member call on misaligned address In-Reply-To: <-0CT3e78TSiBvMrwImLSJDFJkQ7k7BwcMhfoW5tKklA=.aab953cb-9fb3-4b89-acf4-ae6967276c0b@github.com> References: <-0CT3e78TSiBvMrwImLSJDFJkQ7k7BwcMhfoW5tKklA=.aab953cb-9fb3-4b89-acf4-ae6967276c0b@github.com> Message-ID: On Tue, 30 Apr 2024 13:56:07 GMT, Martin Doerr wrote: > As shown in the JBS issue, the Undefined Behavior Sanitizer complains about `VMRegImpl::stack_0()->value()`. This can easily be avoided by skipping the more complicated way which includes addition and subtraction of `first()`. This pull request has now been integrated. Changeset: beebce04 Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/beebce044db97e50a7aea3f83d70e134b2128d0a Stats: 4 lines in 2 files changed: 1 ins; 0 del; 3 mod 8331421: ubsan: vmreg.cpp checking error member call on misaligned address Reviewed-by: mbaesken, lucy ------------- PR: https://git.openjdk.org/jdk/pull/19022 From chagedorn at openjdk.org Thu May 2 10:40:08 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 2 May 2024 10:40:08 GMT Subject: RFR: 8305638: Renaming and small clean-ups around predicates [v7] In-Reply-To: References: Message-ID: <-UU0jrN33Dxbp9EJ9u1FSJ2RDYC02JMK84gnzZLUhSg=.0e20361b-81a2-4ae9-a320-70f3cd9804c6@github.com> > **Update: April 22** > > After splitting off and integrating the following PRs from this PR: > https://github.com/openjdk/jdk/pull/18080 > https://github.com/openjdk/jdk/pull/18293 > https://github.com/openjdk/jdk/pull/18628 > https://github.com/openjdk/jdk/pull/18723 > > we are only left with a few renaming and clean-ups from this PR. Directly merging the master branch in was quite hard. I therefore reverted all commits to get back to a clean master and then applied all remaining code changes manually (required a force push). > >
>
> > _------------ Original PR description --------------_ > > This patch is intended for JDK 23. > > While preparing the patch for the full fix for Assertion Predicates [JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981), I still noticed that some changes are not required for the actual fix and could be split off and reviewed separately in this PR. > > The patch applies the following cleanup changes: > - The complete fix had to add slightly different cloning cases in `PhaseIdealLoop::create_bool_from_template_assertion_predicate()` which already has quite some logic to switch between different cases. Additionally, the algorithm in the method itself was already hard to understand and difficult to adapt. I therefore re-implemented it in a separate class `CloneTemplateAssertionPredicateBool` together with some helper classes like `DFSNodeStack`. To use it, I've added a `TemplateAssertionPredicateBool` class that offers three cloning possibilities: > - `clone()`: Clone without modification > - `clone_and_replace_opaque_loop_nodes()`: Clone and replace the `OpaqueLoop*Nodes` with a new init and stride node. > - `clone_and_replace_init()`: Special case of `clone_and_replace_opaque_loop_nodes()` which only replaces `OpaqueLoopInitNode` and clones `OpaqueLoopStrideNode`. > > This refactoring could be extracted from the complete fix. > - The Split If code to detect (`subgraph_has_opaque()`) and clone Template Assertion Predicate Bools was extracted to a separate class `CloneTemplateAssertionPredicateBoolDown` and uses the new `TemplateAssertionPredicateBool` class to do the actual cloning. > - In the process of coding the complete fix, I've refactored the Loop Unswitching code quite a bit. This change could also be extracted into a separate RFE. Changes include: > - Renaming > - Extracting code to separate classes/methods > - Adding comments > - Some small refactoring including: > - Removing unused parameters > - Renaming variables/parameters/methods > > Th... Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' into JDK-8305638 - Merge branch 'refs/heads/master' into JDK-8305638 # Conflicts: # src/hotspot/share/opto/loopPredicate.cpp - Fix useful Template Assertion Predicate marking - Fix useful Parse Predicate marking - Remaining renaming and small clean-ups ------------- Changes: https://git.openjdk.org/jdk/pull/16877/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16877&range=06 Stats: 77 lines in 5 files changed: 17 ins; 7 del; 53 mod Patch: https://git.openjdk.org/jdk/pull/16877.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16877/head:pull/16877 PR: https://git.openjdk.org/jdk/pull/16877 From asotona at openjdk.org Thu May 2 11:08:16 2024 From: asotona at openjdk.org (Adam Sotona) Date: Thu, 2 May 2024 11:08:16 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v4] In-Reply-To: References: Message-ID: > Hi, > During performance optimization work on Class-File API as JDK lambda generator we found some static initialization killers. > One of them is `java.lang.classfile.Attributes` with tens of static fields initialized with individual attribute mappers, and common set of all mappers, and static map from attribute names to the mappers. > > I propose to turn all the static fields into lazy-initialized static methods and remove `PREDEFINED_ATTRIBUTES` and `standardAttribute(Utf8Entry name)` static mapping method from the `Attributes` API class. > > Please let me know your comments or objections and please review the [PR](https://github.com/openjdk/jdk/pull/19006) and [CSR](https://bugs.openjdk.org/browse/JDK-8331414), so we can make it into 23. > > Thank you, > Adam Adam Sotona has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'master' into JDK-8331291-attributes - changed order in allowed modules attributes check - added bug number - added impl comment - removed list of predefined attributes standard attributes mapping hard-coded and moved to BoundAttribute added AttributesTest::testAttributesMapping - move mappers implementations to AbstractAttributeMapper - 8331291: java.lang.classfile.Attributes class performs a lot of static initializations ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19006/files - new: https://git.openjdk.org/jdk/pull/19006/files/f0d9174e..fd8da774 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19006&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19006&range=02-03 Stats: 4061 lines in 236 files changed: 1910 ins; 657 del; 1494 mod Patch: https://git.openjdk.org/jdk/pull/19006.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19006/head:pull/19006 PR: https://git.openjdk.org/jdk/pull/19006 From thartmann at openjdk.org Thu May 2 11:41:00 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 2 May 2024 11:41:00 GMT Subject: RFR: 8329258: TailCall should not use frame pointer register for jump target [v4] In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 10:31:09 GMT, Tobias Hartmann wrote: >> Applying @danielogh's patch (see description of [JDK-8329258](https://bugs.openjdk.org/browse/JDK-8329258)) to enable `StressGCM` / `StressLCM` for stub compilations triggers a crash on AArch64. The problem is that the register allocator uses `R29` (`rfp`) which is usually used for the frame pointer to hold the `TailCall` `exc_target` when generating the `_slow_arraycopy_Java` stub: >> https://github.com/openjdk/jdk/blob/6dfb8120c270a76fcba5a5c3c9ad91da3282d5fa/src/hotspot/share/opto/generateOptoStub.cpp#L258-L264 >> >> With `StressGCM` / `StressLCM` the register initialization is scheduled early and `R29` is corrupted by the `MachEpilogNode` which is opaque to the register allocator and inserted right before the `TailCall`: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/share/opto/output.cpp#L320-L326 >> >> >> 028 mov R29, 0x0000ffff78cc0080 # ptr >> >> [...] >> >> 098 # pop frame 16 >> ldp lr, rfp, [sp,#0] <- Epilog kills rfp (and lr + sp) >> add sp, sp, #16 >> >> [...] >> >> 0a0 br R29 # R12 holds method >> >> >> As a result, we jump to a "garbage" location. See [bad code](https://bugs.openjdk.org/secure/attachment/108835/BAD_slow_arraycopy_Java.txt) vs. [good code](https://bugs.openjdk.org/secure/attachment/108834/GOOD_slow_arraycopy_Java.txt). >> >> On x86, we use `no_rbp_RegP` instead: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L2564-L2566 https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L12470 >> >> I implemented the same on AArch64. >> >> I think other platforms are affected as well but I don't have the hardware to test there. @offamitkumar (S390), @TheRealMDoerr (PPC), @RealFYang (RISC-V), @bulasevich (ARM32), could you please have a look? >> >> I also wondered if `R29` shouldn't be a callee-save (SOE) register in the C calling convention? >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/aarch64/aarch64.ad#L139-L140 On x86_64, `RBP` is SOE: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L86-L87 >> >> `TestTailCallInArrayCopyStub.java` will only work once [JDK-8330016](https://bugs.openjdk.org/browse/JDK-8330016) is integrated. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Comment adjustment Thanks for the review, Andrew! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18716#issuecomment-2090281840 From thartmann at openjdk.org Thu May 2 11:41:02 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 2 May 2024 11:41:02 GMT Subject: Integrated: 8329258: TailCall should not use frame pointer register for jump target In-Reply-To: References: Message-ID: On Wed, 10 Apr 2024 12:34:11 GMT, Tobias Hartmann wrote: > Applying @danielogh's patch (see description of [JDK-8329258](https://bugs.openjdk.org/browse/JDK-8329258)) to enable `StressGCM` / `StressLCM` for stub compilations triggers a crash on AArch64. The problem is that the register allocator uses `R29` (`rfp`) which is usually used for the frame pointer to hold the `TailCall` `exc_target` when generating the `_slow_arraycopy_Java` stub: > https://github.com/openjdk/jdk/blob/6dfb8120c270a76fcba5a5c3c9ad91da3282d5fa/src/hotspot/share/opto/generateOptoStub.cpp#L258-L264 > > With `StressGCM` / `StressLCM` the register initialization is scheduled early and `R29` is corrupted by the `MachEpilogNode` which is opaque to the register allocator and inserted right before the `TailCall`: > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/share/opto/output.cpp#L320-L326 > > > 028 mov R29, 0x0000ffff78cc0080 # ptr > > [...] > > 098 # pop frame 16 > ldp lr, rfp, [sp,#0] <- Epilog kills rfp (and lr + sp) > add sp, sp, #16 > > [...] > > 0a0 br R29 # R12 holds method > > > As a result, we jump to a "garbage" location. See [bad code](https://bugs.openjdk.org/secure/attachment/108835/BAD_slow_arraycopy_Java.txt) vs. [good code](https://bugs.openjdk.org/secure/attachment/108834/GOOD_slow_arraycopy_Java.txt). > > On x86, we use `no_rbp_RegP` instead: > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L2564-L2566 https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L12470 > > I implemented the same on AArch64. > > I think other platforms are affected as well but I don't have the hardware to test there. @offamitkumar (S390), @TheRealMDoerr (PPC), @RealFYang (RISC-V), @bulasevich (ARM32), could you please have a look? > > I also wondered if `R29` shouldn't be a callee-save (SOE) register in the C calling convention? > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/aarch64/aarch64.ad#L139-L140 On x86_64, `RBP` is SOE: > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L86-L87 > > `TestTailCallInArrayCopyStub.java` will only work once [JDK-8330016](https://bugs.openjdk.org/browse/JDK-8330016) is integrated. > > Thanks, > Tobias This pull request has now been integrated. Changeset: cccc9535 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/cccc95358d5c38cbcabc7f79abc53674deb1e6d8 Stats: 117 lines in 5 files changed: 113 ins; 0 del; 4 mod 8329258: TailCall should not use frame pointer register for jump target Co-authored-by: Fei Yang Reviewed-by: rcastanedalo, aph ------------- PR: https://git.openjdk.org/jdk/pull/18716 From thartmann at openjdk.org Thu May 2 12:28:52 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 2 May 2024 12:28:52 GMT Subject: RFR: 8331404: IGV: Show line numbers for callees in properties In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 15:38:20 GMT, Christian Hagedorn wrote: > IGV shows the `bci` for a node in the callee, followed by the bci in the caller method and so on until we reach the root method. For the `line` property, we currently only show the line number found in the root method (`first()` is the root method being compiled and `second()` and `third()` are inlined): > > Example program: > ![image](https://github.com/openjdk/jdk/assets/17833009/579fe9eb-4bd8-42d8-9d03-875f25bd97ae) > > Properties of the store to `fFld`: > ![image](https://github.com/openjdk/jdk/assets/17833009/3763cccf-c1ba-4d7f-a986-eae8bf0654b0) > > One could read the line number from the `jvms` property above. But you would need to expand that property with the button on the right side which opens a window. But then you cannot click anything else anymore in IGV until you close the window again. > > A simpler and easier to read solution is to add the line number information to match the bci numbers (they are printed in callee->root method order which I think is okay - especially if there are a lot of inlinees, it could be easier to have the really interesting numbers at the start on the left side). This would look something like that: > ![image](https://github.com/openjdk/jdk/assets/17833009/fcab3af6-69ac-43ae-89be-19fc4476d12f) > > If there is no line number information for a bci, I simply emit a `_`. > > Testing: > - Manual testing in IGV > - Sanity testing by running `java -Xcomp -XX:+PrintIdealGraph -XX:PrintIdealGraphLevel=4 -XX:PrintIdealGraphFile=graph.xml HelloWorld.java`. > > Thanks, > Christian Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19025#pullrequestreview-2035600862 From rcastanedalo at openjdk.org Thu May 2 12:37:53 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 2 May 2024 12:37:53 GMT Subject: RFR: 8331253: 16 bits is not enough for nmethod::_skipped_instructions_size field In-Reply-To: References: Message-ID: On Wed, 1 May 2024 03:31:41 GMT, Vladimir Kozlov wrote: > In [JDK-8329433](https://bugs.openjdk.org/browse/JDK-8329433) I changed `nmethod::_skipped_instructions_size` field type to `uint16_t` assuming that it only count NOP instructions and GC barriers. I did not take into account that Generational ZGC also incudes barrier stubs into this size (original ZGC missed that). It is correct to include them because these stubs are generated in instructions section and not in stubs section: > > > Statistics for 1330 bytecoded nmethods for C2: > ... > ZGC: > main code = 3237080 (75.567032%) > stubs code = 810577 (25.040375%) > skipped insts = 44432 (1.372595%) > > GenZGC: > main code = 4034704 (78.238518%) > stubs code = 1356703 (33.625839%) > skipped insts = 1074611 (26.634197%) > > > Note, GenZGC has bigger code because it has store barriers. It generates a separate stub for each barrier, no sharing. > > After looking on how `_skipped_instructions_size` is used (only in one place when calculated inlinining size of compiled code) I decided replace it with `int _inline_insts_size;`. It is calculated the same way as before. > > And instead of including instructions stubs into `_skipped_instructions_size` I recorded size of instructions in code section before stubs are generated. This allow to get more accurate size of main instructions and no need for `InlineSkippedInstructionsCounter` in GC barriers stubs. > > I also fixed code in C2 which estimates size of code and stubs sections. > > Tested tier1-4,tier8,stress,xcomp Thanks for working on this, Vladimir! I tried out this changeset on a simple example ([example-and-instrumentation.zip](https://github.com/openjdk/jdk/files/15188249/example-and-instrumentation.zip)) using a JVM instrumented with the attached patch to observe the output of `ciMethod::inline_instructions_size()` and this seems to differ before and after the changeset: Before: caller: Test foo (LTest$MyObject;)Ljava/lang/Object; inline instructions size: 0 callee: Test bar (LTest$MyObject;)V inline instructions size: 219 after: caller: Test foo (LTest$MyObject;)Ljava/lang/Object; inline instructions size: 0 callee: Test bar (LTest$MyObject;)V inline instructions size: 183 Is this deviation expected? If so, I suggest to split this changeset into a simple bug fix that only widens the type of `nmethod::_skipped_instructions_size` without affecting the inlining heuristic, and a RFE with the remaining changes. ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19029#pullrequestreview-2035622983 From imyers at openjdk.org Thu May 2 12:50:16 2024 From: imyers at openjdk.org (Ian Myers) Date: Thu, 2 May 2024 12:50:16 GMT Subject: RFR: 8324756: Test vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize is too slow due to dependency verification [v2] In-Reply-To: References: Message-ID: > This change removes dependency verification by passing -XX:-VerifyDependencies in the test. > > `vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java` takes 20min to run on linux-x86_64-server-fastdebug: > > time CONF=linux-x86_64-server-fastdebug make test TEST=vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java > CONF=linux-x86_64-server-fastdebug make test **1412.82s user 15.27s system 115% cpu 20:41.19 total** > > > Passing -XX:-VerifyDependencies flag speeds up the run time to 1min: > > time CONF=linux-x86_64-server-fastdebug make test TEST=vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java TEST_VM_OPTS="-XX:-VerifyDependencies" > CONF=linux-x86_64-server-fastdebug make test **287.27s user 16.19s system 496% cpu 1:01.10 total** > > > Adding -XX:-VerifyDependencies to the test file accomplishes the same run time of 1min: > > time CONF=linux-x86_64-server-fastdebug make test TEST=vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java > CONF=linux-x86_64-server-fastdebug make test **272.33s user 14.56s system 464% cpu 1:01.75 total** Ian Myers has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: [8324756] Remove dependency verification from vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19040/files - new: https://git.openjdk.org/jdk/pull/19040/files/b5944f4e..99314e02 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19040&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19040&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19040.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19040/head:pull/19040 PR: https://git.openjdk.org/jdk/pull/19040 From shade at openjdk.org Thu May 2 12:59:54 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 2 May 2024 12:59:54 GMT Subject: RFR: 8324756: Test vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize is too slow due to dependency verification [v2] In-Reply-To: References: Message-ID: <-ig7Zj830qvQ91e_kbIRRfOn_8Pm23qxFOxUdGsSSWk=.9a40c696-9c91-4729-916d-61965099e0ae@github.com> On Thu, 2 May 2024 12:50:16 GMT, Ian Myers wrote: >> This change removes dependency verification by passing -XX:-VerifyDependencies in the test. >> >> `vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java` takes 20min to run on linux-x86_64-server-fastdebug: >> >> time CONF=linux-x86_64-server-fastdebug make test TEST=vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java >> CONF=linux-x86_64-server-fastdebug make test **1412.82s user 15.27s system 115% cpu 20:41.19 total** >> >> >> Passing -XX:-VerifyDependencies flag speeds up the run time to 1min: >> >> time CONF=linux-x86_64-server-fastdebug make test TEST=vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java TEST_VM_OPTS="-XX:-VerifyDependencies" >> CONF=linux-x86_64-server-fastdebug make test **287.27s user 16.19s system 496% cpu 1:01.10 total** >> >> >> Adding -XX:-VerifyDependencies to the test file accomplishes the same run time of 1min: >> >> time CONF=linux-x86_64-server-fastdebug make test TEST=vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java >> CONF=linux-x86_64-server-fastdebug make test **272.33s user 14.56s system 464% cpu 1:01.75 total** > > Ian Myers has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > [8324756] Remove dependency verification from vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java I think you want to add the reversal of https://github.com/openjdk/jdk/commit/2564f0f99866c33d14947609c276a421ce8cc0a2 to this PR as well. I am not sure we want to run the test with disabled dependency verification, though. It is a compiler test, so we would like to have compiler checking code online as much as possible. Have you explored if this is an issue with Sweeper removal, and if so, if adding GCs help? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19040#issuecomment-2090438866 From asmehra at openjdk.org Thu May 2 13:14:52 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Thu, 2 May 2024 13:14:52 GMT Subject: RFR: 8330813: Don't call methods from Compressed(Oops|Klass) if the associated mode is inactive In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 11:04:07 GMT, Thomas Stuefe wrote: > We should not call methods from CompressedOops if we run with -XX:-UseCompressedOops, and the same goes for CompressedKlass and -XX:-UseCompressedClassPointers. (the latter we do assert in Lilliput). Marked as reviewed by asmehra (Committer). lgtm ------------- PR Review: https://git.openjdk.org/jdk/pull/18883#pullrequestreview-2035709003 PR Comment: https://git.openjdk.org/jdk/pull/18883#issuecomment-2090466879 From asmehra at openjdk.org Thu May 2 13:38:52 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Thu, 2 May 2024 13:38:52 GMT Subject: RFR: 8331344: No compiler replay file with CompilerCommand MemLimit In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 18:39:33 GMT, Thomas Stuefe wrote: > When using the compiler memory limit with the crash suboption (e.g. `-XX:CompileCommand=MemLimit,*.*,1g~crash`), the JVM asserts but may fail to produce a replay file. We also may see partly corrupted hs-err files. > > This happens if the memory limit hit was caused by growing ResourceAreas, not the C2 node arena. We also use ResourceArea when producing the replay file. > > If those RA usages cause another Arena chunk to be allocated, we re-enter `CompilationMemoryStatistic::on_arena_change` recursively, possibly multiple times. This will at least prevent replay file generation, but also may abort error handling altogether if a stack overflow happens. > > The patch prevents that recursion. It would be better to prevent replay file generation from using RA altogether, but this would be a larger patch and difficult to keep from bitrotting. > > Also provided regression test. > > Tested: > > - manually on Linux x64 and MacOS m1, with and without an artificially inflated resource area usage that reliably triggers the error. With the patch, the error is gone. > - GHAs lgtm ------------- Marked as reviewed by asmehra (Committer). PR Review: https://git.openjdk.org/jdk/pull/19005#pullrequestreview-2035769598 From mdoerr at openjdk.org Thu May 2 13:39:55 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 2 May 2024 13:39:55 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic [v2] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 07:57:18 GMT, Roberto Casta?eda Lozano wrote: >> This changeset generalizes the logic to analyze, declare, and communicate which registers are live at a C2 barrier stub so that it can be used by other collectors than ZGC adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). >> >> The main changes are: >> >> - Make it possible to compute register liveness information before (live-in) or after (live-out) each barrier, and let the collector choose by implementing `BarrierSetC2State::needs_livein_data()`. >> >> - Generalize the interface with which collectors declare which registers must be additionally preserved across barrier runtime calls, adding the methods `BarrierStubC2::preserve(Register r)` and `BarrierStubC2::dont_preserve(Register r)`. >> >> - Simplify the interface with which platform-specific logic computes which registers to preserve across barrier runtime calls, replacing the calls to `BarrierStubC2::result()` and `BarrierStubC2::live()` with a single call to `BarrierStubC2::preserve_set()`. >> >> #### Testing >> >> - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> - tier1-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/4d4e743d8f4cddd5288cee1d69c70fe2b9bea066) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. >> - Build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug). @RealFYang, @TheRealMDoerr: could you please test and review the riscv and ppc changes? Thanks! > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Use VMReg::is_concrete for testing sub-registers Can we change `_barrier_set_state` (https://github.com/openjdk/jdk/blob/a024eed7384828643e302f021a253717f53e3778/src/hotspot/share/opto/compile.hpp#L364) from `void*` to `BarrierSetC2State*` and remove the casts? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19026#issuecomment-2090523487 From stuefe at openjdk.org Thu May 2 13:43:58 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 2 May 2024 13:43:58 GMT Subject: RFR: 8331344: No compiler replay file with CompilerCommand MemLimit In-Reply-To: References: Message-ID: <5ajLmP6-ILt_OQ86cCztSfwQY6OFQoLHgP6vxpPIfLc=.acb454b6-0b9c-4470-b814-4ae8f0b43d3a@github.com> On Thu, 2 May 2024 13:36:11 GMT, Ashutosh Mehra wrote: >> When using the compiler memory limit with the crash suboption (e.g. `-XX:CompileCommand=MemLimit,*.*,1g~crash`), the JVM asserts but may fail to produce a replay file. We also may see partly corrupted hs-err files. >> >> This happens if the memory limit hit was caused by growing ResourceAreas, not the C2 node arena. We also use ResourceArea when producing the replay file. >> >> If those RA usages cause another Arena chunk to be allocated, we re-enter `CompilationMemoryStatistic::on_arena_change` recursively, possibly multiple times. This will at least prevent replay file generation, but also may abort error handling altogether if a stack overflow happens. >> >> The patch prevents that recursion. It would be better to prevent replay file generation from using RA altogether, but this would be a larger patch and difficult to keep from bitrotting. >> >> Also provided regression test. >> >> Tested: >> >> - manually on Linux x64 and MacOS m1, with and without an artificially inflated resource area usage that reliably triggers the error. With the patch, the error is gone. >> - GHAs > > lgtm Thank you, @ashu-mehra and @vnkozlov ------------- PR Comment: https://git.openjdk.org/jdk/pull/19005#issuecomment-2090529484 From stuefe at openjdk.org Thu May 2 13:43:59 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 2 May 2024 13:43:59 GMT Subject: Integrated: 8331344: No compiler replay file with CompilerCommand MemLimit In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 18:39:33 GMT, Thomas Stuefe wrote: > When using the compiler memory limit with the crash suboption (e.g. `-XX:CompileCommand=MemLimit,*.*,1g~crash`), the JVM asserts but may fail to produce a replay file. We also may see partly corrupted hs-err files. > > This happens if the memory limit hit was caused by growing ResourceAreas, not the C2 node arena. We also use ResourceArea when producing the replay file. > > If those RA usages cause another Arena chunk to be allocated, we re-enter `CompilationMemoryStatistic::on_arena_change` recursively, possibly multiple times. This will at least prevent replay file generation, but also may abort error handling altogether if a stack overflow happens. > > The patch prevents that recursion. It would be better to prevent replay file generation from using RA altogether, but this would be a larger patch and difficult to keep from bitrotting. > > Also provided regression test. > > Tested: > > - manually on Linux x64 and MacOS m1, with and without an artificially inflated resource area usage that reliably triggers the error. With the patch, the error is gone. > - GHAs This pull request has now been integrated. Changeset: 389f6fe9 Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/389f6fe97c348e28d8573fe4754138d2a0bd6c0d Stats: 29 lines in 3 files changed: 27 ins; 1 del; 1 mod 8331344: No compiler replay file with CompilerCommand MemLimit Reviewed-by: kvn, asmehra ------------- PR: https://git.openjdk.org/jdk/pull/19005 From stuefe at openjdk.org Thu May 2 13:50:58 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 2 May 2024 13:50:58 GMT Subject: RFR: 8330813: Don't call methods from Compressed(Oops|Klass) if the associated mode is inactive In-Reply-To: References: Message-ID: On Thu, 2 May 2024 13:11:35 GMT, Ashutosh Mehra wrote: >> We should not call methods from CompressedOops if we run with -XX:-UseCompressedOops, and the same goes for CompressedKlass and -XX:-UseCompressedClassPointers. (the latter we do assert in Lilliput). > > lgtm Thanks @ashu-mehra ------------- PR Comment: https://git.openjdk.org/jdk/pull/18883#issuecomment-2090547341 From stuefe at openjdk.org Thu May 2 13:50:59 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 2 May 2024 13:50:59 GMT Subject: Integrated: 8330813: Don't call methods from Compressed(Oops|Klass) if the associated mode is inactive In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 11:04:07 GMT, Thomas Stuefe wrote: > We should not call methods from CompressedOops if we run with -XX:-UseCompressedOops, and the same goes for CompressedKlass and -XX:-UseCompressedClassPointers. (the latter we do assert in Lilliput). This pull request has now been integrated. Changeset: dd0b6418 Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/dd0b6418191c765a92bfd03ec4d4206e0da7ee45 Stats: 14 lines in 1 file changed: 10 ins; 0 del; 4 mod 8330813: Don't call methods from Compressed(Oops|Klass) if the associated mode is inactive Reviewed-by: stefank, asmehra ------------- PR: https://git.openjdk.org/jdk/pull/18883 From stuefe at openjdk.org Thu May 2 13:54:08 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 2 May 2024 13:54:08 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v8] In-Reply-To: References: Message-ID: > See [1] for previous discussions. > > We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. > > The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. > > Examples: > > This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` > > This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` > > > --- > > The patch: > > 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. > 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. > 3) Adapted and extended tests > > I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. > > > Tested: > > - manually on Mac m1 (debug and release) > - GHAs are running > - but Oracle will do more testing before this goes in > > [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Remove unused variable ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18969/files - new: https://git.openjdk.org/jdk/pull/18969/files/691a1467..e2aacaed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18969&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18969&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18969/head:pull/18969 PR: https://git.openjdk.org/jdk/pull/18969 From liach at openjdk.org Thu May 2 14:43:00 2024 From: liach at openjdk.org (Chen Liang) Date: Thu, 2 May 2024 14:43:00 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v4] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 11:08:16 GMT, Adam Sotona wrote: >> Hi, >> During performance optimization work on Class-File API as JDK lambda generator we found some static initialization killers. >> One of them is `java.lang.classfile.Attributes` with tens of static fields initialized with individual attribute mappers, and common set of all mappers, and static map from attribute names to the mappers. >> >> I propose to turn all the static fields into lazy-initialized static methods and remove `PREDEFINED_ATTRIBUTES` and `standardAttribute(Utf8Entry name)` static mapping method from the `Attributes` API class. >> >> Please let me know your comments or objections and please review the [PR](https://github.com/openjdk/jdk/pull/19006) and [CSR](https://bugs.openjdk.org/browse/JDK-8331414), so we can make it into 23. >> >> Thank you, >> Adam > > Adam Sotona has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into JDK-8331291-attributes > - changed order in allowed modules attributes check > - added bug number > - added impl comment > - removed list of predefined attributes > standard attributes mapping hard-coded and moved to BoundAttribute > added AttributesTest::testAttributesMapping > - move mappers implementations to AbstractAttributeMapper > - 8331291: java.lang.classfile.Attributes class performs a lot of static initializations On a side note, will we update JEP 466 to include this patch? ------------- Marked as reviewed by liach (Author). PR Review: https://git.openjdk.org/jdk/pull/19006#pullrequestreview-2035945054 From kvn at openjdk.org Thu May 2 14:44:02 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 2 May 2024 14:44:02 GMT Subject: RFR: 8331253: 16 bits is not enough for nmethod::_skipped_instructions_size field In-Reply-To: References: Message-ID: On Thu, 2 May 2024 12:35:45 GMT, Roberto Casta?eda Lozano wrote: >> In [JDK-8329433](https://bugs.openjdk.org/browse/JDK-8329433) I changed `nmethod::_skipped_instructions_size` field type to `uint16_t` assuming that it only count NOP instructions and GC barriers. I did not take into account that Generational ZGC also incudes barrier stubs into this size (original ZGC missed that). It is correct to include them because these stubs are generated in instructions section and not in stubs section: >> >> >> Statistics for 1330 bytecoded nmethods for C2: >> ... >> ZGC: >> main code = 3237080 (75.567032%) >> stubs code = 810577 (25.040375%) >> skipped insts = 44432 (1.372595%) >> >> GenZGC: >> main code = 4034704 (78.238518%) >> stubs code = 1356703 (33.625839%) >> skipped insts = 1074611 (26.634197%) >> >> >> Note, GenZGC has bigger code because it has store barriers. It generates a separate stub for each barrier, no sharing. >> >> After looking on how `_skipped_instructions_size` is used (only in one place when calculated inlinining size of compiled code) I decided replace it with `int _inline_insts_size;`. It is calculated the same way as before. >> >> And instead of including instructions stubs into `_skipped_instructions_size` I recorded size of instructions in code section before stubs are generated. This allow to get more accurate size of main instructions and no need for `InlineSkippedInstructionsCounter` in GC barriers stubs. >> >> I also fixed code in C2 which estimates size of code and stubs sections. >> >> Tested tier1-4,tier8,stress,xcomp > > Thanks for working on this, Vladimir! I tried out this changeset on a simple example ([example-and-instrumentation.zip](https://github.com/openjdk/jdk/files/15188249/example-and-instrumentation.zip)) using a JVM instrumented with the attached patch to observe the output of `ciMethod::inline_instructions_size()` and this seems to differ before and after the changeset: > > Before: > > > caller: Test foo (LTest$MyObject;)Ljava/lang/Object; inline instructions size: 0 > callee: Test bar (LTest$MyObject;)V inline instructions size: 219 > > > after: > > > caller: Test foo (LTest$MyObject;)Ljava/lang/Object; inline instructions size: 0 > callee: Test bar (LTest$MyObject;)V inline instructions size: 183 > > Is this deviation expected? If so, I suggest to split this changeset into a simple bug fix that only widens the type of `nmethod::_skipped_instructions_size` without affecting the inlining heuristic, and a RFE with the remaining changes. @robcasloz, thank you for looking on PR. Yes, 183 is more accurate number. I don't think I need to split it. Splitting is needed if you need to backport. Which is not my case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19029#issuecomment-2090670068 From kvn at openjdk.org Thu May 2 14:44:02 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 2 May 2024 14:44:02 GMT Subject: RFR: 8331253: 16 bits is not enough for nmethod::_skipped_instructions_size field In-Reply-To: References: Message-ID: On Wed, 1 May 2024 03:31:41 GMT, Vladimir Kozlov wrote: > In [JDK-8329433](https://bugs.openjdk.org/browse/JDK-8329433) I changed `nmethod::_skipped_instructions_size` field type to `uint16_t` assuming that it only count NOP instructions and GC barriers. I did not take into account that Generational ZGC also incudes barrier stubs into this size (original ZGC missed that). It is correct to include them because these stubs are generated in instructions section and not in stubs section: > > > Statistics for 1330 bytecoded nmethods for C2: > ... > ZGC: > main code = 3237080 (75.567032%) > stubs code = 810577 (25.040375%) > skipped insts = 44432 (1.372595%) > > GenZGC: > main code = 4034704 (78.238518%) > stubs code = 1356703 (33.625839%) > skipped insts = 1074611 (26.634197%) > > > Note, GenZGC has bigger code because it has store barriers. It generates a separate stub for each barrier, no sharing. > > After looking on how `_skipped_instructions_size` is used (only in one place when calculated inlinining size of compiled code) I decided replace it with `int _inline_insts_size;`. It is calculated the same way as before. > > And instead of including instructions stubs into `_skipped_instructions_size` I recorded size of instructions in code section before stubs are generated. This allow to get more accurate size of main instructions and no need for `InlineSkippedInstructionsCounter` in GC barriers stubs. > > I also fixed code in C2 which estimates size of code and stubs sections. > > Tested tier1-4,tier8,stress,xcomp Thank you, Tobias, for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19029#issuecomment-2090671546 From kvn at openjdk.org Thu May 2 14:44:03 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 2 May 2024 14:44:03 GMT Subject: Integrated: 8331253: 16 bits is not enough for nmethod::_skipped_instructions_size field In-Reply-To: References: Message-ID: On Wed, 1 May 2024 03:31:41 GMT, Vladimir Kozlov wrote: > In [JDK-8329433](https://bugs.openjdk.org/browse/JDK-8329433) I changed `nmethod::_skipped_instructions_size` field type to `uint16_t` assuming that it only count NOP instructions and GC barriers. I did not take into account that Generational ZGC also incudes barrier stubs into this size (original ZGC missed that). It is correct to include them because these stubs are generated in instructions section and not in stubs section: > > > Statistics for 1330 bytecoded nmethods for C2: > ... > ZGC: > main code = 3237080 (75.567032%) > stubs code = 810577 (25.040375%) > skipped insts = 44432 (1.372595%) > > GenZGC: > main code = 4034704 (78.238518%) > stubs code = 1356703 (33.625839%) > skipped insts = 1074611 (26.634197%) > > > Note, GenZGC has bigger code because it has store barriers. It generates a separate stub for each barrier, no sharing. > > After looking on how `_skipped_instructions_size` is used (only in one place when calculated inlinining size of compiled code) I decided replace it with `int _inline_insts_size;`. It is calculated the same way as before. > > And instead of including instructions stubs into `_skipped_instructions_size` I recorded size of instructions in code section before stubs are generated. This allow to get more accurate size of main instructions and no need for `InlineSkippedInstructionsCounter` in GC barriers stubs. > > I also fixed code in C2 which estimates size of code and stubs sections. > > Tested tier1-4,tier8,stress,xcomp This pull request has now been integrated. Changeset: 3383ad63 Author: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/3383ad6397d5a2d8fb232ffd3e29a54e0b37b686 Stats: 46 lines in 9 files changed: 27 ins; 7 del; 12 mod 8331253: 16 bits is not enough for nmethod::_skipped_instructions_size field Reviewed-by: dlong, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/19029 From roland at openjdk.org Thu May 2 14:54:17 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 2 May 2024 14:54:17 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v18] In-Reply-To: References: Message-ID: > This change implements C2 optimizations for calls to > ScopedValue.get(). Indeed, in: > > > v1 = scopedValue.get(); > ... > v2 = scopedValue.get(); > > > `v2` can be replaced by `v1` and the second call to `get()` can be > optimized out. That's true whatever is between the 2 calls unless a > new mapping for `scopedValue` is created in between (when that happens > no optimizations is performed for the method being compiled). Hoisting > a `get()` call out of loop for a loop invariant `scopedValue` should > also be legal in most cases. > > `ScopedValue.get()` is implemented in java code as a 2 step process. A > cache is attached to the current thread object. If the `ScopedValue` > object is in the cache then the result from `get()` is read from > there. Otherwise a slow call is performed that also inserts the > mapping in the cache. The cache itself is lazily allocated. One > `ScopedValue` can be hashed to 2 different indexes in the cache. On a > cache probe, both indexes are checked. As a consequence, the process > of probing the cache is a multi step process (check if the cache is > present, check first index, check second index if first index > failed). If the cache is populated early on, then when the method that > calls `ScopedValue.get()` is compiled, profile reports the slow path > as never taken and only the read from the cache is compiled. > > To perform the optimizations, I added 3 new node types to C2: > > - the pair > ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for > the cache probe > > - a cfg node ScopedValueGetResultNode to help locate the result of the > `get()` call in the IR graph. > > In pseudo code, once the nodes are inserted, the code of a `get()` is: > > > hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) > if (hits_in_the_cache) { > res = ScopedValueGetLoadFromCache(hits_in_the_cache); > } else { > res = ..; //slow call possibly inlined. Subgraph can be arbitray complex > } > res = ScopedValueGetResult(res) > > > In the snippet: > > > v1 = scopedValue.get(); > ... > v2 = scopedValue.get(); > > > Replacing `v2` by `v1` is then done by starting from the > `ScopedValueGetResult` node for the second `get()` and looking for a > dominating `ScopedValueGetResult` for the same `ScopedValue` > object. When one is found, it is used as a replacement. Eliminating > the second `get()` call is achieved by making > `ScopedValueGetHitsInCache` always successful if there's a dominating > `ScopedValueGetResult` and replacing its companion > `ScopedValueGetLoadFromCache` by the dominating > `ScopedValueGetResult`. > > Hoisting a `g... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: whitespaces ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16966/files - new: https://git.openjdk.org/jdk/pull/16966/files/d38872fd..7723c9c7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16966&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16966&range=16-17 Stats: 25 lines in 1 file changed: 0 ins; 1 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/16966.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16966/head:pull/16966 PR: https://git.openjdk.org/jdk/pull/16966 From roland at openjdk.org Thu May 2 15:15:59 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 2 May 2024 15:15:59 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v13] In-Reply-To: <_x-OSownzQQZ8fmlsbvQ42MLf9BGZskECTNncOE0s4E=.8381a076-0cc4-4339-924f-fa22ca780573@github.com> References: <9Eoh8hOSSVvAtf9iVQ6hflQyceUtt4dpZdqm61zg5XI=.358a4d79-70d9-4b54-85d5-37c6817f0fae@github.com> <_x-OSownzQQZ8fmlsbvQ42MLf9BGZskECTNncOE0s4E=.8381a076-0cc4-4339-924f-fa22ca780573@github.com> Message-ID: On Thu, 2 May 2024 08:24:34 GMT, Dean Long wrote: >> Then, I think we should add an assert that `!type->as_instance_klass()->is_interface()` and also that it's not and array of interfaces (using `base_element_klass()`) > > An array of interfaces can be exact: > > new Interface[20].getClasss(); > > and it seems like it would be safe to allow this, so I think we only need one assert for `!type->as_instance_klass()->is_interface()` if we don't trust the result of exact_type(). Right. Then I think it would be safer to add an assert for `!type->as_instance_klass()->is_interface()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1587817018 From rcastanedalo at openjdk.org Thu May 2 15:28:54 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 2 May 2024 15:28:54 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic [v2] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 13:37:14 GMT, Martin Doerr wrote: > Can we change `_barrier_set_state` ( > > https://github.com/openjdk/jdk/blob/a024eed7384828643e302f021a253717f53e3778/src/hotspot/share/opto/compile.hpp#L364 > > ) from `void*` to `BarrierSetC2State*` and remove the casts? Thanks for the suggestion, this would be a nice improvement, however it would be fairly pervasive (I sketched it it in https://github.com/openjdk/jdk/commit/cf5c1587e0ea90a8b3de4c70e0a2bf6ba4158f15), so I think it would be better to apply it as a separate RFE, perhaps after [non-generational ZGC is removed](https://openjdk.org/jeps/474) for simplicity. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19026#issuecomment-2090805965 From mdoerr at openjdk.org Thu May 2 15:49:53 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 2 May 2024 15:49:53 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic [v2] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 07:57:18 GMT, Roberto Casta?eda Lozano wrote: >> This changeset generalizes the logic to analyze, declare, and communicate which registers are live at a C2 barrier stub so that it can be used by other collectors than ZGC adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). >> >> The main changes are: >> >> - Make it possible to compute register liveness information before (live-in) or after (live-out) each barrier, and let the collector choose by implementing `BarrierSetC2State::needs_livein_data()`. >> >> - Generalize the interface with which collectors declare which registers must be additionally preserved across barrier runtime calls, adding the methods `BarrierStubC2::preserve(Register r)` and `BarrierStubC2::dont_preserve(Register r)`. >> >> - Simplify the interface with which platform-specific logic computes which registers to preserve across barrier runtime calls, replacing the calls to `BarrierStubC2::result()` and `BarrierStubC2::live()` with a single call to `BarrierStubC2::preserve_set()`. >> >> #### Testing >> >> - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> - tier1-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/4d4e743d8f4cddd5288cee1d69c70fe2b9bea066) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. >> - Build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug). @RealFYang, @TheRealMDoerr: could you please test and review the riscv and ppc changes? Thanks! > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Use VMReg::is_concrete for testing sub-registers I haven't thought about future usages of `BarrierSetC2State::needs_livein_data()`. I guess it's intended for G1. Otherwise, LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19026#pullrequestreview-2036126334 From sviswanathan at openjdk.org Thu May 2 16:18:54 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 2 May 2024 16:18:54 GMT Subject: RFR: 8326421: Add jtreg test for large arrayCopy disjoint case. [v2] In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 07:11:44 GMT, Swati Sharma wrote: >> Hi All, >> >> Added a new jtreg test case for large arrayCopy disjoint case. >> This will test byte array copy operation for aligned and non aligned cases with array length greater than 2.5MB. >> >> Please review and provide your feedback. >> >> Thanks, >> Swati >> Intel > > Swati Sharma has updated the pull request incrementally with one additional commit since the last revision: > > 8326421: Resolved review comments. Looks good to me as well. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17962#pullrequestreview-2036222099 From sviswanathan at openjdk.org Thu May 2 17:03:56 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 2 May 2024 17:03:56 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v12] In-Reply-To: <1g7DGTS-7SUhuXFL8NniTGAQSgskv-CdrwtOGHymZqk=.f2ea7538-1ef4-4f94-af4d-972d64e7f699@github.com> References: <1g7DGTS-7SUhuXFL8NniTGAQSgskv-CdrwtOGHymZqk=.f2ea7538-1ef4-4f94-af4d-972d64e7f699@github.com> Message-ID: On Thu, 2 May 2024 00:05:20 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > simplification and fix asserts in ldmxcsr, stmxcsr, and emit_prefix_and_int8 Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18476#pullrequestreview-2036328881 From sviswanathan at openjdk.org Thu May 2 17:03:57 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 2 May 2024 17:03:57 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v9] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 00:02:28 GMT, Steve Dohrmann wrote: >>> It looks to me that the source and dest are reversed in the following instruction in call to simd_prefix_and_encode, perhaps that should be a separate PR: // Do we have this wrong src and dst reversed in simd_prefix_and_encode? void Assembler::pextrw(Register dst, XMMRegister src, int imm8) { assert(VM_Version::supports_sse2(), ""); InstructionAttr attributes(AVX_128bit, /* rex_w _/ false, /_ legacy_mode _/ _legacy_mode_bw, /_ no_mask_reg _/ true, /_ uses_vl */ false); int encode = simd_prefix_and_encode(as_XMMRegister(dst->encoding()), xnoreg, src, VEX_SIMD_66, VEX_OPCODE_0F, &attributes); emit_int24((unsigned char)0xC5, (0xC0 | encode), imm8); } Once that PR is fixed, is_src_gpr should be set to true for this one as well. >> >> Verified that the pextrw has the operands reversed per the SDM, so please ignore this comment. > > @sviswa7 Thank you for your review comments. Very helpful! @steveatgh Please also do a merge with master. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2091070895 From duke at openjdk.org Thu May 2 17:11:56 2024 From: duke at openjdk.org (Swati Sharma) Date: Thu, 2 May 2024 17:11:56 GMT Subject: RFR: 8326421: Add jtreg test for large arrayCopy disjoint case. [v2] In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 07:11:44 GMT, Swati Sharma wrote: >> Hi All, >> >> Added a new jtreg test case for large arrayCopy disjoint case. >> This will test byte array copy operation for aligned and non aligned cases with array length greater than 2.5MB. >> >> Please review and provide your feedback. >> >> Thanks, >> Swati >> Intel > > Swati Sharma has updated the pull request incrementally with one additional commit since the last revision: > > 8326421: Resolved review comments. add /contributor @steveatgh ------------- PR Comment: https://git.openjdk.org/jdk/pull/17962#issuecomment-2091089198 From never at openjdk.org Thu May 2 17:49:54 2024 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 2 May 2024 17:49:54 GMT Subject: RFR: 8329982: compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/SimpleDebugInfoTest.java failed assert(oopDesc::is_oop_or_null(val)) failed: bad oop found [v2] In-Reply-To: References: Message-ID: <76ydFdG47VvNGmaDZ-FhC_t5LGaCD-8Fjre-6l5f2YE=.289127d7-543b-4ddd-9b77-32f909610264@github.com> On Wed, 1 May 2024 20:57:14 GMT, Doug Simon wrote: >> This PR adds the missing nmethod entry barriers to JVMCI hand assembled tests. >> It also closes the escape hatch in jvmciCodeInstaller.cpp that allowed JVMCI code to be installed without nmethod entry barriers. > > Doug Simon has updated the pull request incrementally with two additional commits since the last revision: > > - remove vestiges of optional JVMCI nmethod support for entry barriers > - fixed failing tests and removed tests that install no longer valid code It would be super nice if we could figure out a clean way to share canned snippets of assembly from HotSpot back through JVMCI. There are lots of potential complexities though: register usage, the jcc erratum, relocations, fast/slow splits. The emit function could be called from the Graal assembler so that the sizing and alignment can be properly handled. HotSpot relocations could be translated in some fashion and maybe labels could be handled as well. The nmethod entry barrier fast path emission could probably be handled fairly cleanly since it's mostly a straightline snippet with a conditional branch at the end. It's just unclear if building that machinery is more complicated than maintaining and checking a clone of a small piece of assembly. The TestAssembler is a dubious piece of code given the complexity of emitting real nmethods. It doesn't even support the complex return sequence being used these days. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19035#issuecomment-2091158907 From duke at openjdk.org Thu May 2 18:32:02 2024 From: duke at openjdk.org (Swati Sharma) Date: Thu, 2 May 2024 18:32:02 GMT Subject: Integrated: 8326421: Add jtreg test for large arrayCopy disjoint case. In-Reply-To: References: Message-ID: On Thu, 22 Feb 2024 13:01:50 GMT, Swati Sharma wrote: > Hi All, > > Added a new jtreg test case for large arrayCopy disjoint case. > This will test byte array copy operation for aligned and non aligned cases with array length greater than 2.5MB. > > Please review and provide your feedback. > > Thanks, > Swati > Intel This pull request has now been integrated. Changeset: 73cdc9a0 Author: Swati Sharma Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/73cdc9a070249791f7d228a93fe5b9335c5f72bd Stats: 87 lines in 1 file changed: 87 ins; 0 del; 0 mod 8326421: Add jtreg test for large arrayCopy disjoint case. Co-authored-by: Steve Dohrmann Reviewed-by: kvn, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/17962 From roland at openjdk.org Thu May 2 18:47:00 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 2 May 2024 18:47:00 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v9] In-Reply-To: References: <5H6XV7Agl6ZNfGWT-bCbIPsimFTYM0pyIGiAHDQUUyA=.168e21cc-6cd8-42d8-ab59-d5e02e241ea2@github.com> <0RKnLUgc6UBtyxSyezCMWsSbP50hu6fQ6UJPHpGlgSU=.9fafa10f-62ee-4ec8-9093-4e204fcbe504@github.com> <5QbsVmYi0tYGlOvDL4LjJb1SjChIZtaWSMthFM9grMI=.0900e1c3-90b3-4726-a7c6-c2aff49d07ce@github.com> Message-ID: On Mon, 29 Apr 2024 07:12:55 GMT, Emanuel Peter wrote: >> @eme64 can you go over my replies above and let me know if they sound good to you? Thanks. > > I'm waiting for @rwestrel to respond to my last list of comments/questions. @eme64 change is ready for another review ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2091260926 From asmehra at openjdk.org Thu May 2 19:52:53 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Thu, 2 May 2024 19:52:53 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v8] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 13:54:08 GMT, Thomas Stuefe wrote: >> See [1] for previous discussions. >> >> We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. >> >> The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. >> >> Examples: >> >> This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` >> >> This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` >> >> >> --- >> >> The patch: >> >> 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. >> 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. >> 3) Adapted and extended tests >> >> I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. >> >> >> Tested: >> >> - manually on Mac m1 (debug and release) >> - GHAs are running >> - but Oracle will do more testing before this goes in >> >> [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Remove unused variable Marked as reviewed by asmehra (Committer). Just one suggestion which you may pick or ignore. Otherwise looks good. test/hotspot/jtreg/compiler/print/CompileCommandMemLimit.java line 143: > 141: // total NA RA result #nodes limit time type #rc thread method > 142: // 32728 0 32728 ok - 1024M 0.045 c1 1 0x000000011b019c10 compiler/print/CompileCommandMemLimit$TestMain::method1(()J) > 143: oa.shouldMatch("\\d+ +\\d+ +\\d+ +ok +" + numberNodesRegex + " +" + implicitMemoryLimit + " +.* +" + method1regex); A minor suggestion regarding the regex. I find "\s+" more readable than " +" to match multiple spaces. ------------- PR Review: https://git.openjdk.org/jdk/pull/18969#pullrequestreview-2036760008 PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2091437900 PR Review Comment: https://git.openjdk.org/jdk/pull/18969#discussion_r1588272234 From duke at openjdk.org Thu May 2 20:31:17 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Thu, 2 May 2024 20:31:17 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v13] In-Reply-To: References: Message-ID: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: - update for egpr use: bzhil(R,R,R), btq(R,R), btq(R,imm) - Merge branch 'master' into apx-encoding-pr - Update full name - simplification and fix asserts in ldmxcsr, stmxcsr, and emit_prefix_and_int8 - remove is_map1 comment for addb, andb, movb, orb, testb, xchgb, xorb - fix stmxcrs REX2 branch, add asserts to SHA instructions - fixes: pp bits in crc32, REX2 branch in ldmxcsr - add egpr support for popcntq(R,A), cvttsd2siq(R,A), popq(R) - fix 4 more src_is_gpr = true cases, add asserts to check for UseAPX - fix is_gpr arg on two functions with reversed src / dst operands - ... and 10 more: https://git.openjdk.org/jdk/compare/27262415...7b3e8ec7 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/46eb6b42..7b3e8ec7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=11-12 Stats: 117386 lines in 3057 files changed: 52969 ins; 48551 del; 15866 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From dnsimon at openjdk.org Thu May 2 21:35:08 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 2 May 2024 21:35:08 GMT Subject: RFR: 8329982: compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/SimpleDebugInfoTest.java failed assert(oopDesc::is_oop_or_null(val)) failed: bad oop found [v3] In-Reply-To: References: Message-ID: > This PR adds the missing nmethod entry barriers to JVMCI hand assembled tests. > It also closes the escape hatch in jvmciCodeInstaller.cpp that allowed JVMCI code to be installed without nmethod entry barriers. Doug Simon has updated the pull request incrementally with two additional commits since the last revision: - fix NativeCallTest on x64 - remove more vestiges of optional JVMCI nmethod support for entry barriers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19035/files - new: https://git.openjdk.org/jdk/pull/19035/files/be4bf630..1b30b67e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19035&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19035&range=01-02 Stats: 8 lines in 2 files changed: 0 ins; 2 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19035.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19035/head:pull/19035 PR: https://git.openjdk.org/jdk/pull/19035 From sviswanathan at openjdk.org Thu May 2 21:48:00 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 2 May 2024 21:48:00 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v13] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 20:31:17 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: > > - update for egpr use: bzhil(R,R,R), btq(R,R), btq(R,imm) > - Merge branch 'master' into apx-encoding-pr > - Update full name > - simplification and fix asserts in ldmxcsr, stmxcsr, and emit_prefix_and_int8 > - remove is_map1 comment for addb, andb, movb, orb, testb, xchgb, xorb > - fix stmxcrs REX2 branch, add asserts to SHA instructions > - fixes: pp bits in crc32, REX2 branch in ldmxcsr > - add egpr support for popcntq(R,A), cvttsd2siq(R,A), popq(R) > - fix 4 more src_is_gpr = true cases, add asserts to check for UseAPX > - fix is_gpr arg on two functions with reversed src / dst operands > - ... and 10 more: https://git.openjdk.org/jdk/compare/335b7c9e...7b3e8ec7 The recent changes post merge with master look good. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18476#pullrequestreview-2037002903 From duke at openjdk.org Thu May 2 23:33:59 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Thu, 2 May 2024 23:33:59 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v13] In-Reply-To: References: Message-ID: On Mon, 15 Apr 2024 17:43:45 GMT, Vladimir Kozlov wrote: >> Steve Dohrmann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: >> >> - update for egpr use: bzhil(R,R,R), btq(R,R), btq(R,imm) >> - Merge branch 'master' into apx-encoding-pr >> - Update full name >> - simplification and fix asserts in ldmxcsr, stmxcsr, and emit_prefix_and_int8 >> - remove is_map1 comment for addb, andb, movb, orb, testb, xchgb, xorb >> - fix stmxcrs REX2 branch, add asserts to SHA instructions >> - fixes: pp bits in crc32, REX2 branch in ldmxcsr >> - add egpr support for popcntq(R,A), cvttsd2siq(R,A), popq(R) >> - fix 4 more src_is_gpr = true cases, add asserts to check for UseAPX >> - fix is_gpr arg on two functions with reversed src / dst operands >> - ... and 10 more: https://git.openjdk.org/jdk/compare/9abb31e9...7b3e8ec7 > > I have few comments. @vnkozlov Are there other things you would like to see for this pull request? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2091905594 From stuefe at openjdk.org Fri May 3 05:33:17 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 3 May 2024 05:33:17 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v9] In-Reply-To: References: Message-ID: > See [1] for previous discussions. > > We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. > > The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. > > Examples: > > This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` > > This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` > > > --- > > The patch: > > 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. > 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. > 3) Adapted and extended tests > > I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. > > > Tested: > > - manually on Mac m1 (debug and release) > - GHAs are running > - but Oracle will do more testing before this goes in > > [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: - merge master and fix conflicts - Remove unused variable - Remove accidental change to TestDeadPhiMergeMemLoop.java - fix copyrights - fix copyrights - another fix - fix accidental slip in of another test name - fix jdk note number in test comment - Disable memory limit for compiler/loopopts/TestDeepGraphVerifyIterativeGVN.java until JDK-8331295 is fixed - Merge branch 'master' into compiler-default-limit - ... and 6 more: https://git.openjdk.org/jdk/compare/6bef0474...f6396010 ------------- Changes: https://git.openjdk.org/jdk/pull/18969/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18969&range=08 Stats: 165 lines in 7 files changed: 114 ins; 12 del; 39 mod Patch: https://git.openjdk.org/jdk/pull/18969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18969/head:pull/18969 PR: https://git.openjdk.org/jdk/pull/18969 From stuefe at openjdk.org Fri May 3 05:33:17 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 3 May 2024 05:33:17 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v8] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 19:50:17 GMT, Ashutosh Mehra wrote: > Just one suggestion which you may pick or ignore. Otherwise looks good. Many thanks, @ashu-mehra ! I will actually ignore your suggestion, because I want the expression to only match spaces precisely, not whitespaces. But for any-whitespace, I usually do as you suggest. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2092322060 From chagedorn at openjdk.org Fri May 3 05:53:01 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 3 May 2024 05:53:01 GMT Subject: RFR: 8331404: IGV: Show line numbers for callees in properties In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 15:38:20 GMT, Christian Hagedorn wrote: > IGV shows the `bci` for a node in the callee, followed by the bci in the caller method and so on until we reach the root method. For the `line` property, we currently only show the line number found in the root method (`first()` is the root method being compiled and `second()` and `third()` are inlined): > > Example program: > ![image](https://github.com/openjdk/jdk/assets/17833009/579fe9eb-4bd8-42d8-9d03-875f25bd97ae) > > Properties of the store to `fFld`: > ![image](https://github.com/openjdk/jdk/assets/17833009/3763cccf-c1ba-4d7f-a986-eae8bf0654b0) > > One could read the line number from the `jvms` property above. But you would need to expand that property with the button on the right side which opens a window. But then you cannot click anything else anymore in IGV until you close the window again. > > A simpler and easier to read solution is to add the line number information to match the bci numbers (they are printed in callee->root method order which I think is okay - especially if there are a lot of inlinees, it could be easier to have the really interesting numbers at the start on the left side). This would look something like that: > ![image](https://github.com/openjdk/jdk/assets/17833009/fcab3af6-69ac-43ae-89be-19fc4476d12f) > > If there is no line number information for a bci, I simply emit a `_`. > > Testing: > - Manual testing in IGV > - Sanity testing by running `java -Xcomp -XX:+PrintIdealGraph -XX:PrintIdealGraphLevel=4 -XX:PrintIdealGraphFile=graph.xml HelloWorld.java`. > > Thanks, > Christian Thanks Tobias for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19025#issuecomment-2092348356 From chagedorn at openjdk.org Fri May 3 05:53:01 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 3 May 2024 05:53:01 GMT Subject: Integrated: 8331404: IGV: Show line numbers for callees in properties In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 15:38:20 GMT, Christian Hagedorn wrote: > IGV shows the `bci` for a node in the callee, followed by the bci in the caller method and so on until we reach the root method. For the `line` property, we currently only show the line number found in the root method (`first()` is the root method being compiled and `second()` and `third()` are inlined): > > Example program: > ![image](https://github.com/openjdk/jdk/assets/17833009/579fe9eb-4bd8-42d8-9d03-875f25bd97ae) > > Properties of the store to `fFld`: > ![image](https://github.com/openjdk/jdk/assets/17833009/3763cccf-c1ba-4d7f-a986-eae8bf0654b0) > > One could read the line number from the `jvms` property above. But you would need to expand that property with the button on the right side which opens a window. But then you cannot click anything else anymore in IGV until you close the window again. > > A simpler and easier to read solution is to add the line number information to match the bci numbers (they are printed in callee->root method order which I think is okay - especially if there are a lot of inlinees, it could be easier to have the really interesting numbers at the start on the left side). This would look something like that: > ![image](https://github.com/openjdk/jdk/assets/17833009/fcab3af6-69ac-43ae-89be-19fc4476d12f) > > If there is no line number information for a bci, I simply emit a `_`. > > Testing: > - Manual testing in IGV > - Sanity testing by running `java -Xcomp -XX:+PrintIdealGraph -XX:PrintIdealGraphLevel=4 -XX:PrintIdealGraphFile=graph.xml HelloWorld.java`. > > Thanks, > Christian This pull request has now been integrated. Changeset: 8bc641eb Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/8bc641ebe75ba4c975a99a8646b89ed10a7029f5 Stats: 51 lines in 2 files changed: 31 ins; 16 del; 4 mod 8331404: IGV: Show line numbers for callees in properties Reviewed-by: tholenstein, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/19025 From aboldtch at openjdk.org Fri May 3 06:42:57 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 3 May 2024 06:42:57 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic [v2] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 07:57:18 GMT, Roberto Casta?eda Lozano wrote: >> This changeset generalizes the logic to analyze, declare, and communicate which registers are live at a C2 barrier stub so that it can be used by other collectors than ZGC adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). >> >> The main changes are: >> >> - Make it possible to compute register liveness information before (live-in) or after (live-out) each barrier, and let the collector choose by implementing `BarrierSetC2State::needs_livein_data()`. >> >> - Generalize the interface with which collectors declare which registers must be additionally preserved across barrier runtime calls, adding the methods `BarrierStubC2::preserve(Register r)` and `BarrierStubC2::dont_preserve(Register r)`. >> >> - Simplify the interface with which platform-specific logic computes which registers to preserve across barrier runtime calls, replacing the calls to `BarrierStubC2::result()` and `BarrierStubC2::live()` with a single call to `BarrierStubC2::preserve_set()`. >> >> #### Testing >> >> - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> - tier1-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/4d4e743d8f4cddd5288cee1d69c70fe2b9bea066) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. >> - Build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug). @RealFYang, @TheRealMDoerr: could you please test and review the riscv and ppc changes? Thanks! > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Use VMReg::is_concrete for testing sub-registers lgtm. A few nits. src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 98: > 96: _entry(), > 97: _continuation(), > 98: _preserve(live()){} Suggestion: _preserve(live()) {} src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 879: > 877: if (!bs_state->needs_livein_data()) { > 878: RegMask* const regs = bs_state->live(node); > 879: if (regs != NULL) { Suggestion: if (regs != nullptr) { src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 910: > 908: if (bs_state->needs_livein_data()) { > 909: RegMask* const regs = bs_state->live(node); > 910: if (regs != NULL) { Suggestion: if (regs != nullptr) { ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19026#pullrequestreview-2037480805 PR Review Comment: https://git.openjdk.org/jdk/pull/19026#discussion_r1588795914 PR Review Comment: https://git.openjdk.org/jdk/pull/19026#discussion_r1588796061 PR Review Comment: https://git.openjdk.org/jdk/pull/19026#discussion_r1588796181 From rcastanedalo at openjdk.org Fri May 3 06:42:57 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 3 May 2024 06:42:57 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic [v2] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 07:57:18 GMT, Roberto Casta?eda Lozano wrote: >> This changeset generalizes the logic to analyze, declare, and communicate which registers are live at a C2 barrier stub so that it can be used by other collectors than ZGC adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). >> >> The main changes are: >> >> - Make it possible to compute register liveness information before (live-in) or after (live-out) each barrier, and let the collector choose by implementing `BarrierSetC2State::needs_livein_data()`. >> >> - Generalize the interface with which collectors declare which registers must be additionally preserved across barrier runtime calls, adding the methods `BarrierStubC2::preserve(Register r)` and `BarrierStubC2::dont_preserve(Register r)`. >> >> - Simplify the interface with which platform-specific logic computes which registers to preserve across barrier runtime calls, replacing the calls to `BarrierStubC2::result()` and `BarrierStubC2::live()` with a single call to `BarrierStubC2::preserve_set()`. >> >> #### Testing >> >> - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> - tier1-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/4d4e743d8f4cddd5288cee1d69c70fe2b9bea066) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. >> - Build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug). @RealFYang, @TheRealMDoerr: could you please test and review the riscv and ppc changes? Thanks! > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Use VMReg::is_concrete for testing sub-registers Thanks for reviewing, Martin! I reported your suggested refactoring here: https://bugs.openjdk.org/browse/JDK-8331623. > I haven't thought about future usages of `BarrierSetC2State::needs_livein_data()`. I guess it's intended for G1. That's correct, it is primarily intended for G1. But ZGC could also benefit, in the future, from using live-out instead of live-in data in the spilling logic. The current solution is slightly over-conservative in that it might spill some registers unnecessarily. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19026#issuecomment-2092395247 From rcastanedalo at openjdk.org Fri May 3 06:47:10 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 3 May 2024 06:47:10 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic [v3] In-Reply-To: References: Message-ID: <74Np8LGxo8PiyoLAUI7tUlAq7ySVgmGzblZio5Tlhx8=.c0fdf457-bc2f-4f88-a070-325521b469f9@github.com> > This changeset generalizes the logic to analyze, declare, and communicate which registers are live at a C2 barrier stub so that it can be used by other collectors than ZGC adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). > > The main changes are: > > - Make it possible to compute register liveness information before (live-in) or after (live-out) each barrier, and let the collector choose by implementing `BarrierSetC2State::needs_livein_data()`. > > - Generalize the interface with which collectors declare which registers must be additionally preserved across barrier runtime calls, adding the methods `BarrierStubC2::preserve(Register r)` and `BarrierStubC2::dont_preserve(Register r)`. > > - Simplify the interface with which platform-specific logic computes which registers to preserve across barrier runtime calls, replacing the calls to `BarrierStubC2::result()` and `BarrierStubC2::live()` with a single call to `BarrierStubC2::preserve_set()`. > > #### Testing > > - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). > - tier1-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/4d4e743d8f4cddd5288cee1d69c70fe2b9bea066) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. > - Build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug). @RealFYang, @TheRealMDoerr: could you please test and review the riscv and ppc changes? Thanks! Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Apply code style suggestions from Axel Co-authored-by: Axel Boldt-Christmas ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19026/files - new: https://git.openjdk.org/jdk/pull/19026/files/c0fc66de..254c8849 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19026&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19026&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19026.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19026/head:pull/19026 PR: https://git.openjdk.org/jdk/pull/19026 From rcastanedalo at openjdk.org Fri May 3 06:47:10 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 3 May 2024 06:47:10 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic [v2] In-Reply-To: References: Message-ID: On Fri, 3 May 2024 06:40:00 GMT, Axel Boldt-Christmas wrote: > lgtm. > > A few nits. Thanks for reviewing and for the style suggestions, Axel! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19026#issuecomment-2092401560 From dnsimon at openjdk.org Fri May 3 08:27:55 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 3 May 2024 08:27:55 GMT Subject: RFR: 8329982: compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/SimpleDebugInfoTest.java failed assert(oopDesc::is_oop_or_null(val)) failed: bad oop found [v2] In-Reply-To: <76ydFdG47VvNGmaDZ-FhC_t5LGaCD-8Fjre-6l5f2YE=.289127d7-543b-4ddd-9b77-32f909610264@github.com> References: <76ydFdG47VvNGmaDZ-FhC_t5LGaCD-8Fjre-6l5f2YE=.289127d7-543b-4ddd-9b77-32f909610264@github.com> Message-ID: On Thu, 2 May 2024 17:46:54 GMT, Tom Rodriguez wrote: > The TestAssembler is a dubious piece of code given the complexity of emitting real nmethods. It doesn't even support the complex return sequence being used these days. The next time there's a TestAssembler failure due to a change in nmethod invariants, I will remove it completely. There is sufficient coverage now in Graal tests that it no longer offers sufficient value. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19035#issuecomment-2092543435 From aph at openjdk.org Fri May 3 08:52:57 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 3 May 2024 08:52:57 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers In-Reply-To: References: Message-ID: On Tue, 9 Apr 2024 19:44:26 GMT, Dean Long wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > How can we be confident that the encoding is correct? Would it be possible to write tests for this? Maybe one that disassembles it and compares the result to a 3rd party disassembler offline or in-process hsdis? > Thank you @dean-long for the comment. I agree, tests are needed. Up to this point we have not had a separate formal tool to test encoding of x86. I did a lot of manual testing by adding loops that used r0-r31in different addressing patterns. I put these in a stub file that would be compiled by hotspot but not executed. I manually compared the disassembly of that against the output of similar assembly included in a small C program and run on the SDE. This worked pretty well for debugging but the manual aspect of it makes it error-prone and it takes a lot of time, too much time if iterating an implementation. > > Subsequent pull requests will add encoding support for additional APX instructions (e.g. those using New Data Destination). Maybe one of these PRs can include a tool for testing instruction encoding for APX features. What do you think? When we wrote the AArch64 port, there was no available hardware to test it on. So, we wrote a simulator to test it. However, we ran the risk that if our understanding of instruction encoding was wrong, our assembler and our simulator might appear to work correctly when used together, but the result would not run on real AArch64 hardware once it arrived. So, as well as a simulator for the architecture, we verified the internal HotSpot assembler by checking its encoding against GNU `as`. See /test/hotspot/gtest/aarch64, where a Python program generates source for both the HotSpot internal assembler and GNU `as`. I strongly suggest you do something similar. (As a matter for the historical record, this did work. The test found several encoding bugs. Once we got the first real AArch64 hardware, the port worked almost immediately.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2092579988 From bkilambi at openjdk.org Fri May 3 09:14:05 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 3 May 2024 09:14:05 GMT Subject: RFR: 8331400: AArch64: Sync aarch64_vector.ad with aarch64_vector_ad.m4 Message-ID: This commit - [1] modified the aarch64_vector.ad directly. This patch includes that change in the aarch64_vector_ad.m4 file as well and generates the aarch64_vector.ad file from it. [1] https://github.com/openjdk/jdk/commit/185e711bfe4c4d013b56e867f85cfb4177b3a2cf ------------- Commit messages: - 8331400: AArch64: Sync aarch64_vector.ad with aarch64_vector_ad.m4 Changes: https://git.openjdk.org/jdk/pull/19077/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19077&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331400 Stats: 10 lines in 2 files changed: 6 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19077.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19077/head:pull/19077 PR: https://git.openjdk.org/jdk/pull/19077 From aph at openjdk.org Fri May 3 09:38:52 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 3 May 2024 09:38:52 GMT Subject: RFR: 8331400: AArch64: Sync aarch64_vector.ad with aarch64_vector_ad.m4 In-Reply-To: References: Message-ID: <-eH1_cLhL2ADd9kuizMnuMev2nq4lVxdSl7wjVWr030=.b83eb321-4b41-4e8f-818b-b3c57a99ecb4@github.com> On Fri, 3 May 2024 09:07:25 GMT, Bhavana Kilambi wrote: > This commit - [1] modified the aarch64_vector.ad directly. This patch includes that change in the aarch64_vector_ad.m4 file as well and generates the aarch64_vector.ad file from it. > > [1] https://github.com/openjdk/jdk/commit/185e711bfe4c4d013b56e867f85cfb4177b3a2cf OK. Obvious/trivial. Daaamn, that shouldn't have happened. It did happen, though, because the patch wasn't AArch64-specific so none of the AArch64 noticed it. I'm a bit reluctant to splatter aarch64_vector.ad with ` // DO NOT EDIT ANYTHING IN THIS SECTION OF THE FILE` ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19077#pullrequestreview-2037763354 From roland at openjdk.org Fri May 3 10:00:17 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 3 May 2024 10:00:17 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v3] In-Reply-To: References: Message-ID: > Range check `CastII` nodes are removed once loop opts are over. The > test case for this change includes 3 cases where elimination of a > range check `CastII` causes a crash in compiled code because either a > out of bounds array load or a division by zero happen. > > In `test1`: > > - the range checks for the `array[otherArray.length]` loads constant > fold: `otherArray.length` is a `CastII` of i at the `otherArray` > allocation. `i` is less than 9. The `CastII` at the allocation > narrows the type down further to `[0-9]`. > > - the `array[otherArray.length]` loads are control dependent on the > unrelated: > > > if (flag == 0) { > > > test. There's an identical dominating test which replaces that one. As > a consequence, the `array[otherArray.length]` loads become control > dependent on the dominating test. > > - The `CastII` nodes at the `otherArray` allocations are replaced by a > dominating range check `CastII` nodes for: > > > newArray[i] = 42; > > > - After loop opts, the range check `CastII` nodes are removed and the > 2 `array[otherArray.length]` loads common at the first: > > > if (flag == 0) { > > > test before the: > > > float[] otherArray = new float[i]; > > > and > > > newArray[i] = 42; > > > that guarantee `i` is positive. > > - `test1` is called with `i = -1`, the array load proceeds with an out > of bounds index and the crash occurs. > > > `test2` and `test3` are mostly identical except for the check that's > eliminated (a null divisor check) and the instruction that causes a > fault (an integer division). > > The fix I propose is to not eliminate range check `CastII` nodes after > loop opts. When range check`CastII` nodes were introduced, performance > was observed to regress. Removing them after loop opts was found to > preserve both correctness and performance. Today, the performance > regression still exists when `CastII` nodes are left in. So I propose > we keep them until the end of optimizations (so the 2 array loads > above don't lose a dependency and wrongly common) but remove them at > the end of all optimizations. > > In the case of the array loads, they are dependent on a range check > for another array through a range check `CastII` and we must not lose > that dependency otherwise the array loads could float above the range > check at gcm time. I propose we deal with that problem the way it's > handled for `CastPP` nodes: add the dependency to the load (or > division)nodes as a precedence edge when the cast is removed. > > @TobiHartmann ran performance testing for that patch (Thanks!) and reported > no regression. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - review - Merge branch 'master' into JDK-8324517 - Merge branch 'master' into JDK-8324517 - review - Merge branch 'master' into JDK-8324517 - test and fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18377/files - new: https://git.openjdk.org/jdk/pull/18377/files/0de61cbc..ceb30c19 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18377&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18377&range=01-02 Stats: 115362 lines in 3036 files changed: 52226 ins; 47924 del; 15212 mod Patch: https://git.openjdk.org/jdk/pull/18377.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18377/head:pull/18377 PR: https://git.openjdk.org/jdk/pull/18377 From roland at openjdk.org Fri May 3 10:11:25 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 3 May 2024 10:11:25 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v4] In-Reply-To: References: Message-ID: > Range check `CastII` nodes are removed once loop opts are over. The > test case for this change includes 3 cases where elimination of a > range check `CastII` causes a crash in compiled code because either a > out of bounds array load or a division by zero happen. > > In `test1`: > > - the range checks for the `array[otherArray.length]` loads constant > fold: `otherArray.length` is a `CastII` of i at the `otherArray` > allocation. `i` is less than 9. The `CastII` at the allocation > narrows the type down further to `[0-9]`. > > - the `array[otherArray.length]` loads are control dependent on the > unrelated: > > > if (flag == 0) { > > > test. There's an identical dominating test which replaces that one. As > a consequence, the `array[otherArray.length]` loads become control > dependent on the dominating test. > > - The `CastII` nodes at the `otherArray` allocations are replaced by a > dominating range check `CastII` nodes for: > > > newArray[i] = 42; > > > - After loop opts, the range check `CastII` nodes are removed and the > 2 `array[otherArray.length]` loads common at the first: > > > if (flag == 0) { > > > test before the: > > > float[] otherArray = new float[i]; > > > and > > > newArray[i] = 42; > > > that guarantee `i` is positive. > > - `test1` is called with `i = -1`, the array load proceeds with an out > of bounds index and the crash occurs. > > > `test2` and `test3` are mostly identical except for the check that's > eliminated (a null divisor check) and the instruction that causes a > fault (an integer division). > > The fix I propose is to not eliminate range check `CastII` nodes after > loop opts. When range check`CastII` nodes were introduced, performance > was observed to regress. Removing them after loop opts was found to > preserve both correctness and performance. Today, the performance > regression still exists when `CastII` nodes are left in. So I propose > we keep them until the end of optimizations (so the 2 array loads > above don't lose a dependency and wrongly common) but remove them at > the end of all optimizations. > > In the case of the array loads, they are dependent on a range check > for another array through a range check `CastII` and we must not lose > that dependency otherwise the array loads could float above the range > check at gcm time. I propose we deal with that problem the way it's > handled for `CastPP` nodes: add the dependency to the load (or > division)nodes as a precedence edge when the cast is removed. > > @TobiHartmann ran performance testing for that patch (Thanks!) and reported > no regression. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: test fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18377/files - new: https://git.openjdk.org/jdk/pull/18377/files/ceb30c19..5cc658b6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18377&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18377&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18377.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18377/head:pull/18377 PR: https://git.openjdk.org/jdk/pull/18377 From roland at openjdk.org Fri May 3 10:15:54 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 3 May 2024 10:15:54 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v4] In-Reply-To: References: Message-ID: On Fri, 3 May 2024 10:11:25 GMT, Roland Westrelin wrote: >> Range check `CastII` nodes are removed once loop opts are over. The >> test case for this change includes 3 cases where elimination of a >> range check `CastII` causes a crash in compiled code because either a >> out of bounds array load or a division by zero happen. >> >> In `test1`: >> >> - the range checks for the `array[otherArray.length]` loads constant >> fold: `otherArray.length` is a `CastII` of i at the `otherArray` >> allocation. `i` is less than 9. The `CastII` at the allocation >> narrows the type down further to `[0-9]`. >> >> - the `array[otherArray.length]` loads are control dependent on the >> unrelated: >> >> >> if (flag == 0) { >> >> >> test. There's an identical dominating test which replaces that one. As >> a consequence, the `array[otherArray.length]` loads become control >> dependent on the dominating test. >> >> - The `CastII` nodes at the `otherArray` allocations are replaced by a >> dominating range check `CastII` nodes for: >> >> >> newArray[i] = 42; >> >> >> - After loop opts, the range check `CastII` nodes are removed and the >> 2 `array[otherArray.length]` loads common at the first: >> >> >> if (flag == 0) { >> >> >> test before the: >> >> >> float[] otherArray = new float[i]; >> >> >> and >> >> >> newArray[i] = 42; >> >> >> that guarantee `i` is positive. >> >> - `test1` is called with `i = -1`, the array load proceeds with an out >> of bounds index and the crash occurs. >> >> >> `test2` and `test3` are mostly identical except for the check that's >> eliminated (a null divisor check) and the instruction that causes a >> fault (an integer division). >> >> The fix I propose is to not eliminate range check `CastII` nodes after >> loop opts. When range check`CastII` nodes were introduced, performance >> was observed to regress. Removing them after loop opts was found to >> preserve both correctness and performance. Today, the performance >> regression still exists when `CastII` nodes are left in. So I propose >> we keep them until the end of optimizations (so the 2 array loads >> above don't lose a dependency and wrongly common) but remove them at >> the end of all optimizations. >> >> In the case of the array loads, they are dependent on a range check >> for another array through a range check `CastII` and we must not lose >> that dependency otherwise the array loads could float above the range >> check at gcm time. I propose we deal with that problem the way it's >> handled for `CastPP` nodes: add the dependency to the load (or >> division)nodes ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > test fix Thanks for reviewing this. > Did you check if the other usages of `_range_check_dependency` via `CastIINode::has_range_check` are still needed? Seems to me as if at least the checks in `PhaseIdealLoop::match_fill_loop` can be removed. I did but was fairly conservative. In the case of `PhaseIdealLoop::match_fill_loop`, I don't think this change makes a difference: if we don't need the check for `CastIINode::has_range_check` there then it's true with or without that change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18377#issuecomment-2092708764 From roland at openjdk.org Fri May 3 10:20:54 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 3 May 2024 10:20:54 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 11:44:27 GMT, Tobias Hartmann wrote: > `Op_ModI` and `Op_ModL` are missing here. Good catch! I added test cases for `Op_ModI` and `Op_ModL` , the unsigned variants and the also the DivMod variants. I also fixed the patch so it handles all of them. > And isn't this too strong in cases where we can prove that the operand is non-zero? I don't think it's too string. The operand can be non zero because of a range check `CastII` somewhere along the subgraph that starts at the node's second input. In that case, `PhaseIterGVN::no_dependent_zero_check` would return true but removing the range `CastII` would cause the bugs that are triggered by the test case. > Looking at `PhaseIterGVN::no_dependent_zero_check`, I noticed that `UDiv[I/L]Node` and `UMod[I/L]Node` are not handled but I think they should. I think this was missed when these nodes where added by [JDK-8282221](https://bugs.openjdk.org/browse/JDK-8282221). One can probably extend @chhagedorn's test from [JDK-8259227](https://bugs.openjdk.org/browse/JDK-8259227) to trigger the same issue. That seems like a different problem that out of the scope of this particular issue. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18377#discussion_r1589017668 From bkilambi at openjdk.org Fri May 3 11:27:53 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 3 May 2024 11:27:53 GMT Subject: RFR: 8331400: AArch64: Sync aarch64_vector.ad with aarch64_vector_ad.m4 In-Reply-To: References: Message-ID: On Fri, 3 May 2024 09:07:25 GMT, Bhavana Kilambi wrote: > This commit - [1] modified the aarch64_vector.ad directly. This patch includes that change in the aarch64_vector_ad.m4 file as well and generates the aarch64_vector.ad file from it. > > [1] https://github.com/openjdk/jdk/commit/185e711bfe4c4d013b56e867f85cfb4177b3a2cf Thanks for the review. This rarely happens though. I shouldn't have missed this. Can I integrate it or shall I wait for another review (as we need two reviews these days but this one is trivial)? Will still wait for all the tests on macos to pass. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19077#issuecomment-2092812049 From roland at openjdk.org Fri May 3 12:47:56 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 3 May 2024 12:47:56 GMT Subject: RFR: 8305638: Renaming and small clean-ups around predicates [v7] In-Reply-To: <-UU0jrN33Dxbp9EJ9u1FSJ2RDYC02JMK84gnzZLUhSg=.0e20361b-81a2-4ae9-a320-70f3cd9804c6@github.com> References: <-UU0jrN33Dxbp9EJ9u1FSJ2RDYC02JMK84gnzZLUhSg=.0e20361b-81a2-4ae9-a320-70f3cd9804c6@github.com> Message-ID: <2KKv46jcuYFUBM7b-zaZsL_KTEa77P94D5A5fwKAWtY=.56141dde-4f1d-4d09-a7ce-a220a6a699eb@github.com> On Thu, 2 May 2024 10:40:08 GMT, Christian Hagedorn wrote: >> **Update: April 22** >> >> After splitting off and integrating the following PRs from this PR: >> https://github.com/openjdk/jdk/pull/18080 >> https://github.com/openjdk/jdk/pull/18293 >> https://github.com/openjdk/jdk/pull/18628 >> https://github.com/openjdk/jdk/pull/18723 >> >> we are only left with a few renaming and clean-ups from this PR. Directly merging the master branch in was quite hard. I therefore reverted all commits to get back to a clean master and then applied all remaining code changes manually (required a force push). >> >>
>>
>> >> _------------ Original PR description --------------_ >> >> This patch is intended for JDK 23. >> >> While preparing the patch for the full fix for Assertion Predicates [JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981), I still noticed that some changes are not required for the actual fix and could be split off and reviewed separately in this PR. >> >> The patch applies the following cleanup changes: >> - The complete fix had to add slightly different cloning cases in `PhaseIdealLoop::create_bool_from_template_assertion_predicate()` which already has quite some logic to switch between different cases. Additionally, the algorithm in the method itself was already hard to understand and difficult to adapt. I therefore re-implemented it in a separate class `CloneTemplateAssertionPredicateBool` together with some helper classes like `DFSNodeStack`. To use it, I've added a `TemplateAssertionPredicateBool` class that offers three cloning possibilities: >> - `clone()`: Clone without modification >> - `clone_and_replace_opaque_loop_nodes()`: Clone and replace the `OpaqueLoop*Nodes` with a new init and stride node. >> - `clone_and_replace_init()`: Special case of `clone_and_replace_opaque_loop_nodes()` which only replaces `OpaqueLoopInitNode` and clones `OpaqueLoopStrideNode`. >> >> This refactoring could be extracted from the complete fix. >> - The Split If code to detect (`subgraph_has_opaque()`) and clone Template Assertion Predicate Bools was extracted to a separate class `CloneTemplateAssertionPredicateBoolDown` and uses the new `TemplateAssertionPredicateBool` class to do the actual cloning. >> - In the process of coding the complete fix, I've refactored the Loop Unswitching code quite a bit. This change could also be extracted into a separate RFE. Changes include: >> - Renaming >> - Extracting code to separate classes/methods >> - Adding comments >> - Some small refactoring including: >> - Removi... > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into JDK-8305638 > - Merge branch 'refs/heads/master' into JDK-8305638 > > # Conflicts: > # src/hotspot/share/opto/loopPredicate.cpp > - Fix useful Template Assertion Predicate marking > - Fix useful Parse Predicate marking > - Remaining renaming and small clean-ups Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16877#pullrequestreview-2038064110 From roland at openjdk.org Fri May 3 12:51:19 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 3 May 2024 12:51:19 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop Message-ID: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> In the test case: long i; for (; i > 0; i--) { res += 42 / ((int) i); The long counted loop phi has type `[1..100]`. As a consequence, the `ConvL2I` also has type `[1..100]`. The `DivI` node that follows can't fault: it is not guarded by a zero check and has no control set. The `ConvL2I` is split through phi and so is the `DiVI` node: `PhaseIdealLoop::cannot_split_division()` returns true because the value coming from the backedge into the `DivI` (when it is about to be split thru phi) is the result of the `ConvL2I` which has type `[1..100`] so is not zero as far as the compiler can tell. On the last iteration of the loop, i is 1. Because the DivI was split thru Phi, it computes the value for the following iteration, so for i = 0. This causes a crash when the compiled code runs. The same problem can't happen with an int counted loop because logic in `PhaseIdealLoop::split_thru_phi()` prevents a `ConvI2L` from being split thru phi. I propose to fix this the same way: in the test case, it's not true that once the `ConvL2I` is split thru phi it keeps type `[1..100]`. The fix is fairly conservative because it's base on the existing logic for `ConvI2L`: we would want to not split a `ConvL2I` only a counted loopd but. I suppose the same is true for the `ConvI2L` and I thought it would be best to revisit both together. ------------- Commit messages: - test and fix Changes: https://git.openjdk.org/jdk/pull/19086/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19086&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331575 Stats: 68 lines in 2 files changed: 66 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19086.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19086/head:pull/19086 PR: https://git.openjdk.org/jdk/pull/19086 From jbhateja at openjdk.org Fri May 3 14:07:02 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 3 May 2024 14:07:02 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v13] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 20:31:17 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: > > - update for egpr use: bzhil(R,R,R), btq(R,R), btq(R,imm) > - Merge branch 'master' into apx-encoding-pr > - Update full name > - simplification and fix asserts in ldmxcsr, stmxcsr, and emit_prefix_and_int8 > - remove is_map1 comment for addb, andb, movb, orb, testb, xchgb, xorb > - fix stmxcrs REX2 branch, add asserts to SHA instructions > - fixes: pp bits in crc32, REX2 branch in ldmxcsr > - add egpr support for popcntq(R,A), cvttsd2siq(R,A), popq(R) > - fix 4 more src_is_gpr = true cases, add asserts to check for UseAPX > - fix is_gpr arg on two functions with reversed src / dst operands > - ... and 10 more: https://git.openjdk.org/jdk/compare/6985920c...7b3e8ec7 src/hotspot/cpu/x86/assembler_x86.cpp line 2839: > 2837: void Assembler::kmovwl(KRegister dst, KRegister src) { > 2838: assert(VM_Version::supports_evex(), ""); > 2839: InstructionAttr attributes(AVX_128bit, /* rex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); Suggestion: InstructionAttr attributes(AVX_128bit, /* rex_w */ false, /* legacy_mode */ true, /* no_mask_reg */ true, /* uses_vl */ false); No GPR operand here. src/hotspot/cpu/x86/assembler_x86.cpp line 2846: > 2844: void Assembler::kmovdl(KRegister dst, Register src) { > 2845: assert(VM_Version::supports_avx512bw(), ""); > 2846: InstructionAttr attributes(AVX_128bit, /* rex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); FTR, We are doing a legacy demotions in downstream code after checking actual register encoding. src/hotspot/cpu/x86/assembler_x86.cpp line 2860: > 2858: void Assembler::kmovql(KRegister dst, KRegister src) { > 2859: assert(VM_Version::supports_avx512bw(), ""); > 2860: InstructionAttr attributes(AVX_128bit, /* rex_w */ true, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); Suggestion: InstructionAttr attributes(AVX_128bit, /* rex_w */ true, /* legacy_mode */ true, /* no_mask_reg */ true, /* uses_vl */ false); src/hotspot/cpu/x86/assembler_x86.cpp line 6556: > 6554: assert(VM_Version::supports_bmi1(), "tzcnt instruction not supported"); > 6555: emit_int8((unsigned char)0xF3); > 6556: int encode = prefixq_and_encode(dst->encoding(), src->encoding(), true /* is_map1 */); FTR, Quoting relevant except from section 3.1.2.1 of APX specification. ?REX2 must be the last prefix. The byte following it is interpreted as the main opcode byte in the opcode map indicated by M0. The 0x0F escape byte is neither needed nor allowed.? src/hotspot/cpu/x86/assembler_x86.hpp line 536: > 534: REXBIT_X = 0x02, > 535: REXBIT_R = 0x04, > 536: REXBIT_W = 0x08, Suggestion: REX2BIT_B = 0x01, REX2BIT_X = 0x02, REX2BIT_R = 0x04, REX2BIT_W = 0x08, Name change suggestion since these bits are part of REX2 prefix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1589185356 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1589195694 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1589193598 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1589207336 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1588761975 From jbhateja at openjdk.org Fri May 3 14:17:02 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 3 May 2024 14:17:02 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v13] In-Reply-To: References: Message-ID: <8DYTq-UlK3eJ0rZIqZODihapcSTUgO0ExgAeN9tGQ8A=.140f1bfc-5fe8-4d7b-9162-c5c332fbd292@github.com> On Thu, 2 May 2024 20:31:17 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: > > - update for egpr use: bzhil(R,R,R), btq(R,R), btq(R,imm) > - Merge branch 'master' into apx-encoding-pr > - Update full name > - simplification and fix asserts in ldmxcsr, stmxcsr, and emit_prefix_and_int8 > - remove is_map1 comment for addb, andb, movb, orb, testb, xchgb, xorb > - fix stmxcrs REX2 branch, add asserts to SHA instructions > - fixes: pp bits in crc32, REX2 branch in ldmxcsr > - add egpr support for popcntq(R,A), cvttsd2siq(R,A), popq(R) > - fix 4 more src_is_gpr = true cases, add asserts to check for UseAPX > - fix is_gpr arg on two functions with reversed src / dst operands > - ... and 10 more: https://git.openjdk.org/jdk/compare/89f2678c...7b3e8ec7 src/hotspot/cpu/x86/assembler_x86.cpp line 1726: > 1724: > 1725: void Assembler::blsrl(Register dst, Register src) { > 1726: assert(VM_Version::supports_bmi1(), "bit manipulation instructions not supported"); We should extend assertion checks based on register encodings and feature detection upfront using ` VM_Version::supports_apx_f() ` part of [PR#18562](https://github.com/openjdk/jdk/pull/18562) once it lands OR you can merge that pull request with this patch if you find appropriate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1589269902 From kvn at openjdk.org Fri May 3 16:11:52 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 3 May 2024 16:11:52 GMT Subject: RFR: 8331400: AArch64: Sync aarch64_vector.ad with aarch64_vector_ad.m4 In-Reply-To: References: Message-ID: On Fri, 3 May 2024 09:07:25 GMT, Bhavana Kilambi wrote: > This commit - [1] modified the aarch64_vector.ad directly. This patch includes that change in the aarch64_vector_ad.m4 file as well and generates the aarch64_vector.ad file from it. > > [1] https://github.com/openjdk/jdk/commit/185e711bfe4c4d013b56e867f85cfb4177b3a2cf Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19077#pullrequestreview-2038495372 From never at openjdk.org Fri May 3 17:23:53 2024 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 3 May 2024 17:23:53 GMT Subject: RFR: 8329982: compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/SimpleDebugInfoTest.java failed assert(oopDesc::is_oop_or_null(val)) failed: bad oop found [v3] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 21:35:08 GMT, Doug Simon wrote: >> This PR adds the missing nmethod entry barriers to JVMCI hand assembled tests. >> It also closes the escape hatch in jvmciCodeInstaller.cpp that allowed JVMCI code to be installed without nmethod entry barriers. > > Doug Simon has updated the pull request incrementally with two additional commits since the last revision: > > - fix NativeCallTest on x64 > - remove more vestiges of optional JVMCI nmethod support for entry barriers Sounds and looks good. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19035#pullrequestreview-2038677680 From duke at openjdk.org Fri May 3 19:14:07 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Fri, 3 May 2024 19:14:07 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v14] In-Reply-To: References: Message-ID: <727FyZHyBbtRilYRtbP2E4dbZYqj9a-QgXAuicQ2iZQ=.01035706-6591-4df5-bf7d-d7a2f6209015@github.com> > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: revert unneeded legacy flag change for kmovwl(K,K) and kmovql(K,K) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/7b3e8ec7..d93e9893 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=12-13 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From duke at openjdk.org Fri May 3 19:14:11 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Fri, 3 May 2024 19:14:11 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v13] In-Reply-To: <8DYTq-UlK3eJ0rZIqZODihapcSTUgO0ExgAeN9tGQ8A=.140f1bfc-5fe8-4d7b-9162-c5c332fbd292@github.com> References: <8DYTq-UlK3eJ0rZIqZODihapcSTUgO0ExgAeN9tGQ8A=.140f1bfc-5fe8-4d7b-9162-c5c332fbd292@github.com> Message-ID: On Fri, 3 May 2024 14:14:28 GMT, Jatin Bhateja wrote: >> Steve Dohrmann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: >> >> - update for egpr use: bzhil(R,R,R), btq(R,R), btq(R,imm) >> - Merge branch 'master' into apx-encoding-pr >> - Update full name >> - simplification and fix asserts in ldmxcsr, stmxcsr, and emit_prefix_and_int8 >> - remove is_map1 comment for addb, andb, movb, orb, testb, xchgb, xorb >> - fix stmxcrs REX2 branch, add asserts to SHA instructions >> - fixes: pp bits in crc32, REX2 branch in ldmxcsr >> - add egpr support for popcntq(R,A), cvttsd2siq(R,A), popq(R) >> - fix 4 more src_is_gpr = true cases, add asserts to check for UseAPX >> - fix is_gpr arg on two functions with reversed src / dst operands >> - ... and 10 more: https://git.openjdk.org/jdk/compare/dc7f6595...7b3e8ec7 > > src/hotspot/cpu/x86/assembler_x86.cpp line 1726: > >> 1724: >> 1725: void Assembler::blsrl(Register dst, Register src) { >> 1726: assert(VM_Version::supports_bmi1(), "bit manipulation instructions not supported"); > > We should extend assertion checks based on register encodings and feature detection upfront using ` VM_Version::supports_apx_f() ` part of [PR#18562](https://github.com/openjdk/jdk/pull/18562) once it lands OR you can merge that pull request with this patch if you find appropriate. Agree that asserts should be extended. Maybe it would be better to do so in a subsequent PR with feature detection in place. > src/hotspot/cpu/x86/assembler_x86.cpp line 2839: > >> 2837: void Assembler::kmovwl(KRegister dst, KRegister src) { >> 2838: assert(VM_Version::supports_evex(), ""); >> 2839: InstructionAttr attributes(AVX_128bit, /* rex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > > Suggestion: > > InstructionAttr attributes(AVX_128bit, /* rex_w */ false, /* legacy_mode */ true, /* no_mask_reg */ true, /* uses_vl */ false); > > > No GPR operand here. Thanks, made the change back. > src/hotspot/cpu/x86/assembler_x86.cpp line 2860: > >> 2858: void Assembler::kmovql(KRegister dst, KRegister src) { >> 2859: assert(VM_Version::supports_avx512bw(), ""); >> 2860: InstructionAttr attributes(AVX_128bit, /* rex_w */ true, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > > Suggestion: > > InstructionAttr attributes(AVX_128bit, /* rex_w */ true, /* legacy_mode */ true, /* no_mask_reg */ true, /* uses_vl */ false); Thanks, made the change back. > src/hotspot/cpu/x86/assembler_x86.cpp line 6556: > >> 6554: assert(VM_Version::supports_bmi1(), "tzcnt instruction not supported"); >> 6555: emit_int8((unsigned char)0xF3); >> 6556: int encode = prefixq_and_encode(dst->encoding(), src->encoding(), true /* is_map1 */); > > FTR, Quoting relevant except from section 3.1.2.1 of APX specification. > ?REX2 must be the last prefix. The byte following it is interpreted as the main opcode byte in the opcode map indicated by M0. The 0x0F escape byte is neither needed nor allowed.? Thanks, understand. The prefixq_and_encode function used above does not emit the 0x0F opcode prefix for map1 instructions encoded with the REX2 scheme. > src/hotspot/cpu/x86/assembler_x86.hpp line 536: > >> 534: REXBIT_X = 0x02, >> 535: REXBIT_R = 0x04, >> 536: REXBIT_W = 0x08, > > Suggestion: > > REX2BIT_B = 0x01, > REX2BIT_X = 0x02, > REX2BIT_R = 0x04, > REX2BIT_W = 0x08, > > Name change suggestion since these bits are part of REX2 prefix. It's true that the REXBIT constants are currently only used in REX2 encoding code. The reason for choosing the REXBIT name for those four values was that they do refer to REX encoding bits and, if bit-wise refactoring of the existing REX encoding code was to be done later, the REXBIT names would make more sense there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1589636409 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1589635515 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1589635298 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1589635856 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1589635146 From stuefe at openjdk.org Fri May 3 19:16:54 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 3 May 2024 19:16:54 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v2] In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 18:57:16 GMT, Vladimir Kozlov wrote: >>> Thank you, @tstuefe, for filing these bugs. >>> >>> One additional thing I noticed is that we don't produce compilation replay file (its size is 0) for such failures. Can you look why is that? >> >> Yes, its https://bugs.openjdk.org/browse/JDK-8331344 . I'll post a PR shortly. >> >> The problem behind this is more generic, namely that producing replay files needs resource area, and it shouldn't. We should not allocate resource area or heap in fatal error handling. But for now, I'll fix this locally by avoiding the recursion. > >> > Thank you, @tstuefe, for filing these bugs. >> > One additional thing I noticed is that we don't produce compilation replay file (its size is 0) for such failures. Can you look why is that? >> >> Yes, its https://bugs.openjdk.org/browse/JDK-8331344 . I'll post a PR shortly. >> >> The problem behind this is more generic, namely that producing replay files needs resource area, and it shouldn't. We should not allocate resource area or heap in fatal error handling. But for now, I'll fix this locally by avoiding the recursion. > > Good. I think we need to push it before this PR. @vnkozlov SAP did a test series and did not find any issues in their CI ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2093618707 From duke at openjdk.org Fri May 3 19:40:55 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Fri, 3 May 2024 19:40:55 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers In-Reply-To: References: Message-ID: <2ix8fZdbyXTav2FBERlzl7U6JkI3i9hPFGSNKbrDlpo=.a219b3de-7035-44d0-9bdc-3ea599800eb3@github.com> On Tue, 9 Apr 2024 19:44:26 GMT, Dean Long wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > How can we be confident that the encoding is correct? Would it be possible to write tests for this? Maybe one that disassembles it and compares the result to a 3rd party disassembler offline or in-process hsdis? In response to @dean-long, @theRealAph wrote: > When we wrote the AArch64 port, there was no available hardware to test it on. So, we wrote a simulator to test it. However, we ran the risk that if our understanding of instruction encoding was wrong, our assembler and our simulator might appear to work correctly when used together, but the result would not run on real AArch64 hardware once it arrived. So, as well as a simulator for the architecture, we verified the internal HotSpot assembler by checking its encoding against GNU `as`. See /test/hotspot/gtest/aarch64, where a Python program generates source for both the HotSpot internal assembler and GNU `as`. I strongly suggest you do something similar. (As a matter for the historical record, this did work. The test found several encoding bugs. Once we got the first real AArch64 hardware, the port worked almost immediately.) Thanks for the description. It would be great to create a similar tool for x86. I tested the encoding manually using the SDE as the authoritative source. It is tedious though and very time consuming. A subsequent PR in [JDK-8329030](https://bugs.openjdk.org/browse/JDK-8329030), perhaps the one that adds encoding support for New Data Destination variants, should include such a tool. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2093653696 From dnsimon at openjdk.org Fri May 3 19:55:02 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 3 May 2024 19:55:02 GMT Subject: RFR: 8329982: compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/SimpleDebugInfoTest.java failed assert(oopDesc::is_oop_or_null(val)) failed: bad oop found [v3] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 21:35:08 GMT, Doug Simon wrote: >> This PR adds the missing nmethod entry barriers to JVMCI hand assembled tests. >> It also closes the escape hatch in jvmciCodeInstaller.cpp that allowed JVMCI code to be installed without nmethod entry barriers. > > Doug Simon has updated the pull request incrementally with two additional commits since the last revision: > > - fix NativeCallTest on x64 > - remove more vestiges of optional JVMCI nmethod support for entry barriers Thanks for the feedback and reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19035#issuecomment-2093668944 From dnsimon at openjdk.org Fri May 3 19:55:04 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 3 May 2024 19:55:04 GMT Subject: Integrated: 8329982: compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/SimpleDebugInfoTest.java failed assert(oopDesc::is_oop_or_null(val)) failed: bad oop found In-Reply-To: References: Message-ID: On Wed, 1 May 2024 15:03:08 GMT, Doug Simon wrote: > This PR adds the missing nmethod entry barriers to JVMCI hand assembled tests. > It also closes the escape hatch in jvmciCodeInstaller.cpp that allowed JVMCI code to be installed without nmethod entry barriers. This pull request has now been integrated. Changeset: b20fa7b4 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/b20fa7b48b0f0a64c0760f26188d4c11c3233b61 Stats: 731 lines in 14 files changed: 404 ins; 309 del; 18 mod 8329982: compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/SimpleDebugInfoTest.java failed assert(oopDesc::is_oop_or_null(val)) failed: bad oop found Reviewed-by: never ------------- PR: https://git.openjdk.org/jdk/pull/19035 From kvn at openjdk.org Fri May 3 20:00:55 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 3 May 2024 20:00:55 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v9] In-Reply-To: References: Message-ID: On Fri, 3 May 2024 05:33:17 GMT, Thomas Stuefe wrote: >> See [1] for previous discussions. >> >> We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. >> >> The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. >> >> Examples: >> >> This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` >> >> This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` >> >> >> --- >> >> The patch: >> >> 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. >> 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. >> 3) Adapted and extended tests >> >> I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. >> >> >> Tested: >> >> - manually on Mac m1 (debug and release) >> - GHAs are running >> - but Oracle will do more testing before this goes in >> >> [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: > > - merge master and fix conflicts > - Remove unused variable > - Remove accidental change to TestDeadPhiMergeMemLoop.java > - fix copyrights > - fix copyrights > - another fix > - fix accidental slip in of another test name > - fix jdk note number in test comment > - Disable memory limit for compiler/loopopts/TestDeepGraphVerifyIterativeGVN.java until JDK-8331295 is fixed > - Merge branch 'master' into compiler-default-limit > - ... and 6 more: https://git.openjdk.org/jdk/compare/6bef0474...f6396010 Okay, I will run our testing too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2093678690 From sgibbons at openjdk.org Fri May 3 23:22:31 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 3 May 2024 23:22:31 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v18] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 50 commits: - Merge remote-tracking branch 'origin/master' into indexof - Move arrays_equals back to c2_MacroAssembler - Merge branch 'openjdk:master' into indexof - Remove infinite loop (used for debugging) - Merge branch 'openjdk:master' into indexof - Cleaned up, ready for review - Pre-cleanup code - Add JMH. Add 16-byte compares to arrays_equals - Better method for mask creation - Merge branch 'openjdk:master' into indexof - ... and 40 more: https://git.openjdk.org/jdk/compare/b20fa7b4...f52d281d ------------- Changes: https://git.openjdk.org/jdk/pull/16753/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=17 Stats: 4345 lines in 17 files changed: 4183 ins; 26 del; 136 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From cslucas at openjdk.org Fri May 3 23:43:56 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 3 May 2024 23:43:56 GMT Subject: Integrated: 8330247: C2: CTW fail with assert(adr_t->is_known_instance_field()) failed: instance required In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 00:35:16 GMT, Cesar Soares Lucas wrote: > The logic in reduce allocation merges (RAM) makes use of `PhaseMacroExpand:;can_eliminate_allocation` to check whether an allocation can be scalar replaced. However, we can only SR allocations of exact types - due to rematerialization logic. > > The scalar replacement logic not related to RAM has this check in `split_unique_types` so there is no performance regression by adding this check here. > > Tested on Linux x64 tiers1-3. This pull request has now been integrated. Changeset: 9347bb7d Author: Cesar Soares Lucas Committer: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/9347bb7df845ee465c378c6f511ef8a6caea18ea Stats: 76 lines in 2 files changed: 76 ins; 0 del; 0 mod 8330247: C2: CTW fail with assert(adr_t->is_known_instance_field()) failed: instance required Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/18851 From kvn at openjdk.org Sat May 4 05:32:00 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 4 May 2024 05:32:00 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v9] In-Reply-To: References: Message-ID: <-vdGyJLNkw9M33NtEHJo_YGHfWldStOLI23Dk36Yi8w=.92a6b81b-ed1a-4e77-b657-eab04e219a3e@github.com> On Fri, 3 May 2024 05:33:17 GMT, Thomas Stuefe wrote: >> See [1] for previous discussions. >> >> We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. >> >> The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. >> >> Examples: >> >> This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` >> >> This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` >> >> >> --- >> >> The patch: >> >> 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. >> 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. >> 3) Adapted and extended tests >> >> I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. >> >> >> Tested: >> >> - manually on Mac m1 (debug and release) >> - GHAs are running >> - but Oracle will do more testing before this goes in >> >> [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: > > - merge master and fix conflicts > - Remove unused variable > - Remove accidental change to TestDeadPhiMergeMemLoop.java > - fix copyrights > - fix copyrights > - another fix > - fix accidental slip in of another test name > - fix jdk note number in test comment > - Disable memory limit for compiler/loopopts/TestDeepGraphVerifyIterativeGVN.java until JDK-8331295 is fixed > - Merge branch 'master' into compiler-default-limit > - ... and 6 more: https://git.openjdk.org/jdk/compare/6bef0474...f6396010 Looks like `memlimit,TestFindNode::test,0` does not work. The test failed with stress flags [JDK-8331283](https://bugs.openjdk.org/browse/JDK-8331283) on linux-aarch64 (Ampere). With the same call stack. I see `-XX:CompileCommand=memlimit,TestFindNode::test,0` in flags passed to test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2094029663 From kvn at openjdk.org Sat May 4 05:34:58 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 4 May 2024 05:34:58 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v9] In-Reply-To: References: Message-ID: On Fri, 3 May 2024 05:33:17 GMT, Thomas Stuefe wrote: >> See [1] for previous discussions. >> >> We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. >> >> The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. >> >> Examples: >> >> This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` >> >> This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` >> >> >> --- >> >> The patch: >> >> 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. >> 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. >> 3) Adapted and extended tests >> >> I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. >> >> >> Tested: >> >> - manually on Mac m1 (debug and release) >> - GHAs are running >> - but Oracle will do more testing before this goes in >> >> [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: > > - merge master and fix conflicts > - Remove unused variable > - Remove accidental change to TestDeadPhiMergeMemLoop.java > - fix copyrights > - fix copyrights > - another fix > - fix accidental slip in of another test name > - fix jdk note number in test comment > - Disable memory limit for compiler/loopopts/TestDeepGraphVerifyIterativeGVN.java until JDK-8331295 is fixed > - Merge branch 'master' into compiler-default-limit > - ... and 6 more: https://git.openjdk.org/jdk/compare/6bef0474...f6396010 I attached hs_err file to [JDK-8331283](https://bugs.openjdk.org/browse/JDK-8331283) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2094032227 From stuefe at openjdk.org Sat May 4 08:25:16 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 4 May 2024 08:25:16 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v10] In-Reply-To: References: Message-ID: > See [1] for previous discussions. > > We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. > > The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. > > Examples: > > This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` > > This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` > > > --- > > The patch: > > 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. > 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. > 3) Adapted and extended tests > > I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. > > > Tested: > > - manually on Mac m1 (debug and release) > - GHAs are running > - but Oracle will do more testing before this goes in > > [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: fix compiler.c2.TestFindNode again ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18969/files - new: https://git.openjdk.org/jdk/pull/18969/files/f6396010..695a0096 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18969&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18969&range=08-09 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18969/head:pull/18969 PR: https://git.openjdk.org/jdk/pull/18969 From stuefe at openjdk.org Sat May 4 08:28:02 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 4 May 2024 08:28:02 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v9] In-Reply-To: <-vdGyJLNkw9M33NtEHJo_YGHfWldStOLI23Dk36Yi8w=.92a6b81b-ed1a-4e77-b657-eab04e219a3e@github.com> References: <-vdGyJLNkw9M33NtEHJo_YGHfWldStOLI23Dk36Yi8w=.92a6b81b-ed1a-4e77-b657-eab04e219a3e@github.com> Message-ID: On Sat, 4 May 2024 05:29:01 GMT, Vladimir Kozlov wrote: > Looks like `memlimit,TestFindNode::test,0` does not work. The test failed with stress flags [JDK-8331283](https://bugs.openjdk.org/browse/JDK-8331283) on linux-aarch64 (Ampere). With the same call stack. I see `-XX:CompileCommand=memlimit,TestFindNode::test,0` in flags passed to test. I fixed the error, a simple typo (forgot to properly name the class in the option). Retested locally on Mac m1, confirmed that the test passes with this commit, fails without it. I am not sure what went wrong, since I did these tests beforehand. Maybe I pushed the wrong version. That is slightly concerning, however, since the error should have come up at SAP too. I guess they don't test with all these stress options. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2094076704 From aph at openjdk.org Sat May 4 08:59:51 2024 From: aph at openjdk.org (Andrew Haley) Date: Sat, 4 May 2024 08:59:51 GMT Subject: RFR: 8331400: AArch64: Sync aarch64_vector.ad with aarch64_vector_ad.m4 In-Reply-To: References: Message-ID: <3xMwZQNT3DeAV1usUi-YyhRNoF8oxWNtUIHoB3eSEPw=.6f4fcb8a-731a-49bd-bc2a-571b4fd90ec9@github.com> On Fri, 3 May 2024 11:24:56 GMT, Bhavana Kilambi wrote: > Thanks for the review. This rarely happens though. I shouldn't have missed this. Can I integrate it or shall I wait for another review (as we need two reviews these days but this one is trivial)? Will still wait for all the tests on macos to pass. Just push it. @vnkozlov has acked it now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19077#issuecomment-2094086457 From kvn at openjdk.org Sat May 4 18:31:53 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 4 May 2024 18:31:53 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v10] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 08:25:16 GMT, Thomas Stuefe wrote: >> See [1] for previous discussions. >> >> We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. >> >> The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. >> >> Examples: >> >> This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` >> >> This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` >> >> >> --- >> >> The patch: >> >> 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. >> 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. >> 3) Adapted and extended tests >> >> I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. >> >> >> Tested: >> >> - manually on Mac m1 (debug and release) >> - GHAs are running >> - but Oracle will do more testing before this goes in >> >> [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > fix compiler.c2.TestFindNode again `-XX:CompileCommand=memstat,compiler.c2.TestFindNode::*,print` - leftover from debugging? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2094340138 From sgibbons at openjdk.org Sat May 4 19:35:21 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Sat, 4 May 2024 19:35:21 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Rearrange; add lambdas for clarity ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/f52d281d..fb4da92a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=17-18 Stats: 2561 lines in 1 file changed: 804 ins; 954 del; 803 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From eliu at openjdk.org Sun May 5 00:46:53 2024 From: eliu at openjdk.org (Eric Liu) Date: Sun, 5 May 2024 00:46:53 GMT Subject: RFR: 8331400: AArch64: Sync aarch64_vector.ad with aarch64_vector_ad.m4 In-Reply-To: References: Message-ID: On Fri, 3 May 2024 09:07:25 GMT, Bhavana Kilambi wrote: > This commit - [1] modified the aarch64_vector.ad directly. This patch includes that change in the aarch64_vector_ad.m4 file as well and generates the aarch64_vector.ad file from it. > > [1] https://github.com/openjdk/jdk/commit/185e711bfe4c4d013b56e867f85cfb4177b3a2cf Marked as reviewed by eliu (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19077#pullrequestreview-2039566564 From fyang at openjdk.org Mon May 6 04:53:58 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 6 May 2024 04:53:58 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic [v3] In-Reply-To: <74Np8LGxo8PiyoLAUI7tUlAq7ySVgmGzblZio5Tlhx8=.c0fdf457-bc2f-4f88-a070-325521b469f9@github.com> References: <74Np8LGxo8PiyoLAUI7tUlAq7ySVgmGzblZio5Tlhx8=.c0fdf457-bc2f-4f88-a070-325521b469f9@github.com> Message-ID: On Fri, 3 May 2024 06:47:10 GMT, Roberto Casta?eda Lozano wrote: >> This changeset generalizes the logic to analyze, declare, and communicate which registers are live at a C2 barrier stub so that it can be used by other collectors than ZGC adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). >> >> The main changes are: >> >> - Make it possible to compute register liveness information before (live-in) or after (live-out) each barrier, and let the collector choose by implementing `BarrierSetC2State::needs_livein_data()`. >> >> - Generalize the interface with which collectors declare which registers must be additionally preserved across barrier runtime calls, adding the methods `BarrierStubC2::preserve(Register r)` and `BarrierStubC2::dont_preserve(Register r)`. >> >> - Simplify the interface with which platform-specific logic computes which registers to preserve across barrier runtime calls, replacing the calls to `BarrierStubC2::result()` and `BarrierStubC2::live()` with a single call to `BarrierStubC2::preserve_set()`. >> >> #### Testing >> >> - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> - tier1-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/4d4e743d8f4cddd5288cee1d69c70fe2b9bea066) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. >> - Build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug). @RealFYang, @TheRealMDoerr: could you please test and review the riscv and ppc changes? Thanks! > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Apply code style suggestions from Axel > > Co-authored-by: Axel Boldt-Christmas @robcasloz : This also tests good on linux-riscv64 platform. LGTM. Thanks for the ping! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19026#pullrequestreview-2039978152 From epeter at openjdk.org Mon May 6 06:30:07 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 6 May 2024 06:30:07 GMT Subject: RFR: 8329273: C2 SuperWord: Some basic MemorySegment IR tests Message-ID: I could not find any IR vectorization tests for `MemorySegment` yet. I make sure to exercise different backing types: - arrays - buffers - native memory I filed a follow-up RFE, to eventually make all cases where I have "FAILS" vectorize: [JDK-8331659](https://bugs.openjdk.org/browse/JDK-8331659): C2 SuperWord: investicate failed vectorization in compiler/loopopts/superword/TestMemorySegment.java ------------- Commit messages: - fix tabs - speed up test - small cosmetic fix - make things static - long loop tests - handle AlignVector - int cases - int-index case - disable mixed tests - mixed - ... and 14 more: https://git.openjdk.org/jdk/compare/ea3909ac...b6f16a58 Changes: https://git.openjdk.org/jdk/pull/18535/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18535&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8329273 Stats: 860 lines in 1 file changed: 860 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18535.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18535/head:pull/18535 PR: https://git.openjdk.org/jdk/pull/18535 From duke at openjdk.org Mon May 6 06:36:05 2024 From: duke at openjdk.org (Daniel Skantz) Date: Mon, 6 May 2024 06:36:05 GMT Subject: RFR: 8330016: Stress seed should be initialized for runtime stub compilation Message-ID: <-Pj7vK6Wpgv3FnP2KZXB2so2wWraG3LWEKds9XDI3pY=.d6595586-aa83-47bc-84ed-ff8c4a1a5550@github.com> We can initialize the stress seed for runtime stub compilation as we already do for method compilation. This found the bug described in JDK-8329258. It would apply if StressGCM or StressLCM vm flags are set. Testing: T1-5 default options. T1-5 with -XX:+StressLCM and -XX:+StressGCM. Manually tested that the stress seed is set and printed to compilation log if either stress option is set. ------------- Commit messages: - move lines again - move lines - factor out streed seed initialization - add stress seed in runtime stub Changes: https://git.openjdk.org/jdk/pull/19095/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19095&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330016 Stats: 31 lines in 2 files changed: 20 ins; 10 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19095.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19095/head:pull/19095 PR: https://git.openjdk.org/jdk/pull/19095 From rcastanedalo at openjdk.org Mon May 6 07:33:52 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 6 May 2024 07:33:52 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic [v2] In-Reply-To: References: Message-ID: On Fri, 3 May 2024 06:44:42 GMT, Roberto Casta?eda Lozano wrote: >> lgtm. >> >> A few nits. > >> lgtm. >> >> A few nits. > > Thanks for reviewing and for the style suggestions, Axel! > @robcasloz : This also tests good on linux-riscv64 platform. LGTM. Thanks for the ping! Thanks for testing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19026#issuecomment-2095358512 From chagedorn at openjdk.org Mon May 6 07:38:53 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 6 May 2024 07:38:53 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop In-Reply-To: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> Message-ID: On Fri, 3 May 2024 12:33:43 GMT, Roland Westrelin wrote: > In the test case: > > > long i; > for (; i > 0; i--) { > res += 42 / ((int) i); > > > The long counted loop phi has type `[1..100]`. As a consequence, the > `ConvL2I` also has type `[1..100]`. The `DivI` node that follows can't > fault: it is not guarded by a zero check and has no control set. > > The `ConvL2I` is split through phi and so is the `DiVI` node: > `PhaseIdealLoop::cannot_split_division()` returns true because the > value coming from the backedge into the `DivI` (when it is about to be > split thru phi) is the result of the `ConvL2I` which has type > `[1..100`] so is not zero as far as the compiler can tell. > > On the last iteration of the loop, i is 1. Because the DivI was split > thru Phi, it computes the value for the following iteration, so for i > = 0. This causes a crash when the compiled code runs. > > The same problem can't happen with an int counted loop because logic > in `PhaseIdealLoop::split_thru_phi()` prevents a `ConvI2L` from being > split thru phi. I propose to fix this the same way: in the test case, > it's not true that once the `ConvL2I` is split thru phi it keeps type > `[1..100]`. The fix is fairly conservative because it's base on the > existing logic for `ConvI2L`: we would want to not split a `ConvL2I` > only a counted loopd but. I suppose the same is true for the `ConvI2L` > and I thought it would be best to revisit both together. You could also add the regression tests from the duplicated issue [JDK-8298851](https://bugs.openjdk.org/browse/JDK-8298851). Marked as reviewed by chagedorn (Reviewer). src/hotspot/share/opto/loopopts.cpp line 54: > 52: if ((n->Opcode() == Op_ConvI2L && n->bottom_type() != TypeLong::LONG) || > 53: (n->Opcode() == Op_ConvL2I && n->bottom_type() != TypeInt::INT)) { > 54: // ConvI2L/ConvL2I may have type information on it which is unsafe to push up The fix looks good and we should probably move forward with that. But I'm still wondering though, if these bailouts are really needed in the general case. It seems like this problem is mainly for loop phis. Couldn't we check the types of loop phi inputs and bail out if one includes zero? IIUC, the backedge should be an `AddL` with type `[0..99]`, i.e. post-decremented. So, pushing through seems wrong in this case since the backedge type includes zero. But it could be detected and prevented. However, if the phi has type `[5..100]`, for example, then it should be safe. We probably then just need to update the type of the pushed-through `ConvL2I` to whatever the type of the input is. This type checking approach could work in the general case. But I'm not sure though, if it's beneficial to split these `Conv` nodes through phis in general. But it seems the bailouts have only been introduced due to correctness bugs and not due to performance reasons. Anyway, this should be investigated separately, including benchmarking. ------------- PR Review: https://git.openjdk.org/jdk/pull/19086#pullrequestreview-2040163524 PR Review: https://git.openjdk.org/jdk/pull/19086#pullrequestreview-2040170877 PR Review Comment: https://git.openjdk.org/jdk/pull/19086#discussion_r1590639677 From chagedorn at openjdk.org Mon May 6 07:51:00 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 6 May 2024 07:51:00 GMT Subject: RFR: 8305638: Renaming and small clean-ups around predicates [v7] In-Reply-To: <-UU0jrN33Dxbp9EJ9u1FSJ2RDYC02JMK84gnzZLUhSg=.0e20361b-81a2-4ae9-a320-70f3cd9804c6@github.com> References: <-UU0jrN33Dxbp9EJ9u1FSJ2RDYC02JMK84gnzZLUhSg=.0e20361b-81a2-4ae9-a320-70f3cd9804c6@github.com> Message-ID: On Thu, 2 May 2024 10:40:08 GMT, Christian Hagedorn wrote: >> **Update: April 22** >> >> After splitting off and integrating the following PRs from this PR: >> https://github.com/openjdk/jdk/pull/18080 >> https://github.com/openjdk/jdk/pull/18293 >> https://github.com/openjdk/jdk/pull/18628 >> https://github.com/openjdk/jdk/pull/18723 >> >> we are only left with a few renaming and clean-ups from this PR. Directly merging the master branch in was quite hard. I therefore reverted all commits to get back to a clean master and then applied all remaining code changes manually (required a force push). >> >>
>>
>> >> _------------ Original PR description --------------_ >> >> This patch is intended for JDK 23. >> >> While preparing the patch for the full fix for Assertion Predicates [JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981), I still noticed that some changes are not required for the actual fix and could be split off and reviewed separately in this PR. >> >> The patch applies the following cleanup changes: >> - The complete fix had to add slightly different cloning cases in `PhaseIdealLoop::create_bool_from_template_assertion_predicate()` which already has quite some logic to switch between different cases. Additionally, the algorithm in the method itself was already hard to understand and difficult to adapt. I therefore re-implemented it in a separate class `CloneTemplateAssertionPredicateBool` together with some helper classes like `DFSNodeStack`. To use it, I've added a `TemplateAssertionPredicateBool` class that offers three cloning possibilities: >> - `clone()`: Clone without modification >> - `clone_and_replace_opaque_loop_nodes()`: Clone and replace the `OpaqueLoop*Nodes` with a new init and stride node. >> - `clone_and_replace_init()`: Special case of `clone_and_replace_opaque_loop_nodes()` which only replaces `OpaqueLoopInitNode` and clones `OpaqueLoopStrideNode`. >> >> This refactoring could be extracted from the complete fix. >> - The Split If code to detect (`subgraph_has_opaque()`) and clone Template Assertion Predicate Bools was extracted to a separate class `CloneTemplateAssertionPredicateBoolDown` and uses the new `TemplateAssertionPredicateBool` class to do the actual cloning. >> - In the process of coding the complete fix, I've refactored the Loop Unswitching code quite a bit. This change could also be extracted into a separate RFE. Changes include: >> - Renaming >> - Extracting code to separate classes/methods >> - Adding comments >> - Some small refactoring including: >> - Removi... > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into JDK-8305638 > - Merge branch 'refs/heads/master' into JDK-8305638 > > # Conflicts: > # src/hotspot/share/opto/loopPredicate.cpp > - Fix useful Template Assertion Predicate marking > - Fix useful Parse Predicate marking > - Remaining renaming and small clean-ups Thanks Roland for re-reviewing it! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16877#issuecomment-2095381632 From chagedorn at openjdk.org Mon May 6 07:51:00 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 6 May 2024 07:51:00 GMT Subject: Integrated: 8305638: Renaming and small clean-ups around predicates In-Reply-To: References: Message-ID: On Wed, 29 Nov 2023 08:42:41 GMT, Christian Hagedorn wrote: > **Update: April 22** > > After splitting off and integrating the following PRs from this PR: > https://github.com/openjdk/jdk/pull/18080 > https://github.com/openjdk/jdk/pull/18293 > https://github.com/openjdk/jdk/pull/18628 > https://github.com/openjdk/jdk/pull/18723 > > we are only left with a few renaming and clean-ups from this PR. Directly merging the master branch in was quite hard. I therefore reverted all commits to get back to a clean master and then applied all remaining code changes manually (required a force push). > >
>
> > _------------ Original PR description --------------_ > > This patch is intended for JDK 23. > > While preparing the patch for the full fix for Assertion Predicates [JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981), I still noticed that some changes are not required for the actual fix and could be split off and reviewed separately in this PR. > > The patch applies the following cleanup changes: > - The complete fix had to add slightly different cloning cases in `PhaseIdealLoop::create_bool_from_template_assertion_predicate()` which already has quite some logic to switch between different cases. Additionally, the algorithm in the method itself was already hard to understand and difficult to adapt. I therefore re-implemented it in a separate class `CloneTemplateAssertionPredicateBool` together with some helper classes like `DFSNodeStack`. To use it, I've added a `TemplateAssertionPredicateBool` class that offers three cloning possibilities: > - `clone()`: Clone without modification > - `clone_and_replace_opaque_loop_nodes()`: Clone and replace the `OpaqueLoop*Nodes` with a new init and stride node. > - `clone_and_replace_init()`: Special case of `clone_and_replace_opaque_loop_nodes()` which only replaces `OpaqueLoopInitNode` and clones `OpaqueLoopStrideNode`. > > This refactoring could be extracted from the complete fix. > - The Split If code to detect (`subgraph_has_opaque()`) and clone Template Assertion Predicate Bools was extracted to a separate class `CloneTemplateAssertionPredicateBoolDown` and uses the new `TemplateAssertionPredicateBool` class to do the actual cloning. > - In the process of coding the complete fix, I've refactored the Loop Unswitching code quite a bit. This change could also be extracted into a separate RFE. Changes include: > - Renaming > - Extracting code to separate classes/methods > - Adding comments > - Some small refactoring including: > - Removing unused parameters > - Renaming variables/parameters/methods > > Th... This pull request has now been integrated. Changeset: 4bbd972c Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/4bbd972cbb114b99e856aa7904c0240049052b6a Stats: 77 lines in 5 files changed: 17 ins; 7 del; 53 mod 8305638: Renaming and small clean-ups around predicates Reviewed-by: roland, epeter ------------- PR: https://git.openjdk.org/jdk/pull/16877 From eosterlund at openjdk.org Mon May 6 07:54:52 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 6 May 2024 07:54:52 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic [v3] In-Reply-To: <74Np8LGxo8PiyoLAUI7tUlAq7ySVgmGzblZio5Tlhx8=.c0fdf457-bc2f-4f88-a070-325521b469f9@github.com> References: <74Np8LGxo8PiyoLAUI7tUlAq7ySVgmGzblZio5Tlhx8=.c0fdf457-bc2f-4f88-a070-325521b469f9@github.com> Message-ID: On Fri, 3 May 2024 06:47:10 GMT, Roberto Casta?eda Lozano wrote: >> This changeset generalizes the logic to analyze, declare, and communicate which registers are live at a C2 barrier stub so that it can be used by other collectors than ZGC adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). >> >> The main changes are: >> >> - Make it possible to compute register liveness information before (live-in) or after (live-out) each barrier, and let the collector choose by implementing `BarrierSetC2State::needs_livein_data()`. >> >> - Generalize the interface with which collectors declare which registers must be additionally preserved across barrier runtime calls, adding the methods `BarrierStubC2::preserve(Register r)` and `BarrierStubC2::dont_preserve(Register r)`. >> >> - Simplify the interface with which platform-specific logic computes which registers to preserve across barrier runtime calls, replacing the calls to `BarrierStubC2::result()` and `BarrierStubC2::live()` with a single call to `BarrierStubC2::preserve_set()`. >> >> #### Testing >> >> - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> - tier1-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/4d4e743d8f4cddd5288cee1d69c70fe2b9bea066) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. >> - Build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug). @RealFYang, @TheRealMDoerr: could you please test and review the riscv and ppc changes? Thanks! > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Apply code style suggestions from Axel > > Co-authored-by: Axel Boldt-Christmas Looks good! ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19026#pullrequestreview-2040195709 From rcastanedalo at openjdk.org Mon May 6 08:07:57 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 6 May 2024 08:07:57 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic [v3] In-Reply-To: References: <74Np8LGxo8PiyoLAUI7tUlAq7ySVgmGzblZio5Tlhx8=.c0fdf457-bc2f-4f88-a070-325521b469f9@github.com> Message-ID: On Mon, 6 May 2024 07:52:31 GMT, Erik ?sterlund wrote: > Looks good! Thanks for reviewing, Erik! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19026#issuecomment-2095408712 From dfenacci at openjdk.org Mon May 6 08:11:06 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 6 May 2024 08:11:06 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v5] In-Reply-To: References: Message-ID: <2JMRO8HiRuX9_LGSUTOFeg71lygs5EHrw9AkCmC4zsg=.fad94b04-a8cd-464f-8404-005f45deb173@github.com> > # Issue > When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. > > # Causes > On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. > The same is true for `StoreVector`s. > When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 > > where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor offsets interfere. > Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > but we don?t make sure that there are no masks or offsets. > A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and offsets here too but in this case we can include these cases if the masks and offsets of the vector stores are equivalent. > > # Solution > To avoid folding `Load`- and `StoreVector`s with masks and offsets we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns its own (to avoid ever being the same as a store node). In this way, the checks in `MemNode::can_see_stored_value` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 > > and `StoreNode::Identity` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > will fail if masks or offsets are used. > For 2 stores of the same value we instead check for mask and offset equality. > > Regression tests for all versions of `Load/StoreVectorGather/Masked` have been add... Damon Fenacci has updated the pull request incrementally with three additional commits since the last revision: - JDK-8325520: check for same vector type - JDK-8325520: remove useless checks for second store type - JDK-8325520: use -1 as unknown opcode in store_Opcode ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18347/files - new: https://git.openjdk.org/jdk/pull/18347/files/d25bcacf..524ff888 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=03-04 Stats: 43 lines in 2 files changed: 14 ins; 3 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/18347.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18347/head:pull/18347 PR: https://git.openjdk.org/jdk/pull/18347 From dfenacci at openjdk.org Mon May 6 08:16:57 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 6 May 2024 08:16:57 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v4] In-Reply-To: References: Message-ID: On Thu, 25 Apr 2024 12:39:32 GMT, Emanuel Peter wrote: > // Load a float vector from the memory segment (internally, it does checkIndex and unsafe load from the byte array) > FloatVector floatVector = FloatVector.fromMemorySegment(ms, offset, ByteOrder.nativeOrder()) > ``` > > I did not test this, but I think something like this should work. Right! I totally missed that `from`- `intoMmemorySegment` methods! I guess that?s probably one reason why vector nodes are not typed. I?ve added checks to `StoreNode::Identity` as well. > src/hotspot/share/opto/memnode.cpp line 3533: > >> 3531: const Node* offsets = stv->in(StoreVectorScatterMaskedNode::Offsets); >> 3532: const Node* mask = stv->in(StoreVectorScatterMaskedNode::Mask); >> 3533: if (mem->is_StoreVectorScatterMasked()) { > > This `if` will always be true, since we already check `mem->Opcode() == Opcode()`. The code would be simpler if you extracted the offsets and masks in parallel. Yep, I removed this useless `if` and 2 other terms in the ifs before that. I'm just not sure of what you mean with > if you extracted the offsets and masks in parallel. > src/hotspot/share/opto/vectornode.hpp line 916: > >> 914: virtual int store_Opcode() const { >> 915: // Ensure it is different from any store opcode >> 916: return Op_LoadVectorGather; > > I think you should take `-1`, which is what `MemNode::store_Opcode()` returns. It means "unknown". OK. I was just a bit concerned that a check for equality between 2 `store_Opcode` could be true because they are both -1 but this shouldn?t happen. Changed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18347#issuecomment-2095424211 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1590680121 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1590679021 From epeter at openjdk.org Mon May 6 08:28:03 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 6 May 2024 08:28:03 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v9] In-Reply-To: References: <5H6XV7Agl6ZNfGWT-bCbIPsimFTYM0pyIGiAHDQUUyA=.168e21cc-6cd8-42d8-ab59-d5e02e241ea2@github.com> <0RKnLUgc6UBtyxSyezCMWsSbP50hu6fQ6UJPHpGlgSU=.9fafa10f-62ee-4ec8-9093-4e204fcbe504@github.com> <5QbsVmYi0tYGlOvDL4LjJb1SjChIZtaWSMthFM9grMI=.0900e1c3-90b3-4726-a7c6-c2aff49d07ce@github.com> Message-ID: On Thu, 2 May 2024 18:44:25 GMT, Roland Westrelin wrote: >> I'm waiting for @rwestrel to respond to my last list of comments/questions. > > @eme64 change is ready for another review @rwestrel I feel like I am heavily stepping on your toes now.... Can you please do refactoring in a separate prior PR? This change is now 3K+ lines, and even reading through it all takes me more than a day, I simply cannot commit this many hours at a time. I'm thinking in particular about your most recent changes with: - `class Invariance` - `estimate_if_peeling_possible` Don't get me wrong: I like those refactorings, but they should be done separately. If you can find anything else that could be done separately, that would help greatly. I have been painstakingly separating my SuperWord PR's into more reviewable patches, and I do get quicker reviews that way. My concern: I think the code is now in a state that can be understood (if one spends a day reading it all), but it is hard for me to say that it is correct. If I now approve this patch, then a subsequent reviewer will pay less attention, hence, I feel like I cannot just approve it too quickly now. If I am too annoying, feel free to ask someone else to review and I will just step back. Maybe @theRealAph wants to review for a while? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2095441298 From epeter at openjdk.org Mon May 6 09:11:04 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 6 May 2024 09:11:04 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v9] In-Reply-To: References: <5H6XV7Agl6ZNfGWT-bCbIPsimFTYM0pyIGiAHDQUUyA=.168e21cc-6cd8-42d8-ab59-d5e02e241ea2@github.com> <0RKnLUgc6UBtyxSyezCMWsSbP50hu6fQ6UJPHpGlgSU=.9fafa10f-62ee-4ec8-9093-4e204fcbe504@github.com> <5QbsVmYi0tYGlOvDL4LjJb1SjChIZtaWSMthFM9grMI=.0900e1c3-90b3-4726-a7c6-c2aff49d07ce@github.com> Message-ID: On Thu, 2 May 2024 18:44:25 GMT, Roland Westrelin wrote: >> I'm waiting for @rwestrel to respond to my last list of comments/questions. > > @eme64 change is ready for another review @rwestrel one idea to split things here: - Early inline of ScopedValue methods - Parsing to IR nodes and expansion back. - Optimization - Tests This way, I can spend only a few hours on one at a time, and we can get this done. Of course, you cannot really integrate an individual one, maybe there is a way to use skara for dependent PR's? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2095515126 From rcastanedalo at openjdk.org Mon May 6 09:29:58 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 6 May 2024 09:29:58 GMT Subject: Integrated: 8331418: ZGC: generalize barrier liveness logic In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 18:43:03 GMT, Roberto Casta?eda Lozano wrote: > This changeset generalizes the logic to analyze, declare, and communicate which registers are live at a C2 barrier stub so that it can be used by other collectors than ZGC adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). > > The main changes are: > > - Make it possible to compute register liveness information before (live-in) or after (live-out) each barrier, and let the collector choose by implementing `BarrierSetC2State::needs_livein_data()`. > > - Generalize the interface with which collectors declare which registers must be additionally preserved across barrier runtime calls, adding the methods `BarrierStubC2::preserve(Register r)` and `BarrierStubC2::dont_preserve(Register r)`. > > - Simplify the interface with which platform-specific logic computes which registers to preserve across barrier runtime calls, replacing the calls to `BarrierStubC2::result()` and `BarrierStubC2::live()` with a single call to `BarrierStubC2::preserve_set()`. > > #### Testing > > - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). > - tier1-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/4d4e743d8f4cddd5288cee1d69c70fe2b9bea066) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. > - Build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug). @RealFYang, @TheRealMDoerr: could you please test and review the riscv and ppc changes? Thanks! This pull request has now been integrated. Changeset: 6c776411 Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/6c7764118ef1a684edddb302a4eaff36d80c783f Stats: 112 lines in 9 files changed: 60 ins; 37 del; 15 mod 8331418: ZGC: generalize barrier liveness logic Reviewed-by: mdoerr, aboldtch, fyang, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/19026 From rcastanedalo at openjdk.org Mon May 6 09:44:53 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 6 May 2024 09:44:53 GMT Subject: RFR: 8330016: Stress seed should be initialized for runtime stub compilation In-Reply-To: <-Pj7vK6Wpgv3FnP2KZXB2so2wWraG3LWEKds9XDI3pY=.d6595586-aa83-47bc-84ed-ff8c4a1a5550@github.com> References: <-Pj7vK6Wpgv3FnP2KZXB2so2wWraG3LWEKds9XDI3pY=.d6595586-aa83-47bc-84ed-ff8c4a1a5550@github.com> Message-ID: On Mon, 6 May 2024 06:31:47 GMT, Daniel Skantz wrote: > We can initialize the stress seed for runtime stub compilation as we already do for method compilation. This found the bug described in JDK-8329258. It would apply if StressGCM or StressLCM vm flags are set. > > Testing: T1-5 default options. T1-5 with -XX:+StressLCM and -XX:+StressGCM. Manually tested that the stress seed is set and printed to compilation log if either stress option is set. src/hotspot/share/opto/compile.cpp line 5066: > 5064: // Auxiliary methods to support randomized stressing/fuzzing. > 5065: > 5066: void Compile::initialize_stress_seed(DirectiveSet* directive) { Suggestion: void Compile::initialize_stress_seed(const DirectiveSet* directive) { src/hotspot/share/opto/compile.hpp line 1282: > 1280: > 1281: // seed random number generation and log the seed for repeatability. > 1282: void initialize_stress_seed(DirectiveSet* directive); Suggestion: void initialize_stress_seed(const DirectiveSet* directive); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19095#discussion_r1590770840 PR Review Comment: https://git.openjdk.org/jdk/pull/19095#discussion_r1590771133 From rcastanedalo at openjdk.org Mon May 6 09:50:53 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 6 May 2024 09:50:53 GMT Subject: RFR: 8330016: Stress seed should be initialized for runtime stub compilation In-Reply-To: <-Pj7vK6Wpgv3FnP2KZXB2so2wWraG3LWEKds9XDI3pY=.d6595586-aa83-47bc-84ed-ff8c4a1a5550@github.com> References: <-Pj7vK6Wpgv3FnP2KZXB2so2wWraG3LWEKds9XDI3pY=.d6595586-aa83-47bc-84ed-ff8c4a1a5550@github.com> Message-ID: On Mon, 6 May 2024 06:31:47 GMT, Daniel Skantz wrote: > We can initialize the stress seed for runtime stub compilation as we already do for method compilation. This found the bug described in JDK-8329258. It would apply if StressGCM or StressLCM vm flags are set. > > Testing: T1-5 default options. T1-5 with -XX:+StressLCM and -XX:+StressGCM. Manually tested that the stress seed is set and printed to compilation log if either stress option is set. Looks good otherwise! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19095#pullrequestreview-2040390290 From chagedorn at openjdk.org Mon May 6 10:40:52 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 6 May 2024 10:40:52 GMT Subject: RFR: 8330016: Stress seed should be initialized for runtime stub compilation In-Reply-To: <-Pj7vK6Wpgv3FnP2KZXB2so2wWraG3LWEKds9XDI3pY=.d6595586-aa83-47bc-84ed-ff8c4a1a5550@github.com> References: <-Pj7vK6Wpgv3FnP2KZXB2so2wWraG3LWEKds9XDI3pY=.d6595586-aa83-47bc-84ed-ff8c4a1a5550@github.com> Message-ID: On Mon, 6 May 2024 06:31:47 GMT, Daniel Skantz wrote: > We can initialize the stress seed for runtime stub compilation as we already do for method compilation. This found the bug described in JDK-8329258. It would apply if StressGCM or StressLCM vm flags are set. > > Testing: T1-5 default options. T1-5 with -XX:+StressLCM and -XX:+StressGCM. Manually tested that the stress seed is set and printed to compilation log if either stress option is set. Looks good to me, too. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19095#pullrequestreview-2040466871 From roland at openjdk.org Mon May 6 10:52:53 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 6 May 2024 10:52:53 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: Message-ID: On Fri, 3 May 2024 10:18:44 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/compile.cpp line 3906: >> >>> 3904: for (DUIterator_Fast imax, i = m->fast_outs(imax); i < imax; i++) { >>> 3905: Node* use = m->fast_out(i); >>> 3906: if (use->is_Mem() || use->Opcode() == Op_DivI || use->Opcode() == Op_DivL) { >> >> `Op_ModI` and `Op_ModL` are missing here. And isn't this too strong in cases where we can prove that the operand is non-zero? Could you re-use `PhaseIterGVN::no_dependent_zero_check`? Please also add corresponding tests. >> >> Looking at `PhaseIterGVN::no_dependent_zero_check`, I noticed that `UDiv[I/L]Node` and `UMod[I/L]Node` are not handled but I think they should. I think this was missed when these nodes where added by [JDK-8282221](https://bugs.openjdk.org/browse/JDK-8282221). One can probably extend @chhagedorn's test from [JDK-8259227](https://bugs.openjdk.org/browse/JDK-8259227) to trigger the same issue. > >> `Op_ModI` and `Op_ModL` are missing here. > > Good catch! I added test cases for `Op_ModI` and `Op_ModL` , the unsigned variants and the also the DivMod variants. I also fixed the patch so it handles all of them. > >> And isn't this too strong in cases where we can prove that the operand is non-zero? > > I don't think it's too strong. The operand can be non zero because of a range check `CastII` somewhere along the subgraph that starts at the node's second input. In that case, `PhaseIterGVN::no_dependent_zero_check` would return true but removing the range `CastII` would cause the bugs that are triggered by the test case. > >> Looking at `PhaseIterGVN::no_dependent_zero_check`, I noticed that `UDiv[I/L]Node` and `UMod[I/L]Node` are not handled but I think they should. I think this was missed when these nodes where added by [JDK-8282221](https://bugs.openjdk.org/browse/JDK-8282221). One can probably extend @chhagedorn's test from [JDK-8259227](https://bugs.openjdk.org/browse/JDK-8259227) to trigger the same issue. > > That seems like a different problem that out of the scope of this particular issue. I realized that I didn't understand your comment when I replied. What you're saying, I think, is that if we have, say, a `CastII` that's input to a `DivI` node, if the input to that cast is non zero, then we don't need to add the `CastII` control as dependency to the `DivI`. The problem, I think, is that the `CastII` could be input to say an `AddI` node which would then be input to the `DivI`. What we would then need to know is whether if we remove the `CastII`, the `AddI` is still non null or not. That doesn't seem straightforward because this is done once we have no igvn instance to propagate types anymore. So, while I agree this is conservative, it still seems like the most reasonable fix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18377#discussion_r1590836071 From roland at openjdk.org Mon May 6 10:52:57 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 6 May 2024 10:52:57 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 12:42:12 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - review >> - Merge branch 'master' into JDK-8324517 >> - test and fix > > test/hotspot/jtreg/compiler/rangechecks/TestArrayAccessAboveRCAfterRCCastIIEliminated.java line 37: > >> 35: * @run main/othervm -XX:-TieredCompilation -XX:-UseOnStackReplacement -XX:-BackgroundCompilation >> 36: * -XX:CompileCommand=dontinline,TestArrayAccessAboveRCAfterRCCastIIEliminated::notInlined >> 37: * -XX:+StressIGVN -XX:StressSeed=94546681 TestArrayAccessAboveRCAfterRCCastIIEliminated > > `Error: VM option 'StressIGVN' is diagnostic and must be enabled via -XX:+UnlockDiagnosticVMOptions.` Fixed in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18377#discussion_r1590836230 From redestad at openjdk.org Mon May 6 11:18:55 2024 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 6 May 2024 11:18:55 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v4] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 11:08:16 GMT, Adam Sotona wrote: >> Hi, >> During performance optimization work on Class-File API as JDK lambda generator we found some static initialization killers. >> One of them is `java.lang.classfile.Attributes` with tens of static fields initialized with individual attribute mappers, and common set of all mappers, and static map from attribute names to the mappers. >> >> I propose to turn all the static fields into lazy-initialized static methods and remove `PREDEFINED_ATTRIBUTES` and `standardAttribute(Utf8Entry name)` static mapping method from the `Attributes` API class. >> >> Please let me know your comments or objections and please review the [PR](https://github.com/openjdk/jdk/pull/19006) and [CSR](https://bugs.openjdk.org/browse/JDK-8331414), so we can make it into 23. >> >> Thank you, >> Adam > > Adam Sotona has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into JDK-8331291-attributes > - changed order in allowed modules attributes check > - added bug number > - added impl comment > - removed list of predefined attributes > standard attributes mapping hard-coded and moved to BoundAttribute > added AttributesTest::testAttributesMapping > - move mappers implementations to AbstractAttributeMapper > - 8331291: java.lang.classfile.Attributes class performs a lot of static initializations FWIW code changes looks good to me. There seems to be a number of tests that still need to be updated to use the new methods instead of the old constants. ------------- Marked as reviewed by redestad (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19006#pullrequestreview-2040558054 From roland at openjdk.org Mon May 6 11:52:53 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 6 May 2024 11:52:53 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop In-Reply-To: References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> Message-ID: On Mon, 6 May 2024 07:31:22 GMT, Christian Hagedorn wrote: > But I'm still wondering though, if these bailouts are really needed in the general case. It seems like this problem is mainly for loop phis. Couldn't we check the types of loop phi inputs and bail out if one includes zero? Are we sure divisions are the only cause of bugs? My understanding of this issue is that once pushed thru phi, the type of the `ConvL2I` is simply not correct and that's the root cause. I wonder if we could get other failures because of this: maybe a node becoming top because of the incorrect type or an out of bound array access. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19086#discussion_r1590910489 From roland at openjdk.org Mon May 6 12:00:00 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 6 May 2024 12:00:00 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v9] In-Reply-To: References: <5H6XV7Agl6ZNfGWT-bCbIPsimFTYM0pyIGiAHDQUUyA=.168e21cc-6cd8-42d8-ab59-d5e02e241ea2@github.com> <0RKnLUgc6UBtyxSyezCMWsSbP50hu6fQ6UJPHpGlgSU=.9fafa10f-62ee-4ec8-9093-4e204fcbe504@github.com> <5QbsVmYi0tYGlOvDL4LjJb1SjChIZtaWSMthFM9grMI=.0900e1c3-90b3-4726-a7c6-c2aff49d07ce@github.com> Message-ID: On Mon, 6 May 2024 08:25:10 GMT, Emanuel Peter wrote: > I'm thinking in particular about your most recent changes with: > > * `class Invariance` > > * `estimate_if_peeling_possible` > > > Don't get me wrong: I like those refactorings, but they should be done separately. The problem I see is that they have little value unless this patch is integrated as it is. What if another reviewer thinks it's better to keep everything related to loop predication together? There's no need to change the class `Invariance` then. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2095842139 From roland at openjdk.org Mon May 6 12:06:02 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 6 May 2024 12:06:02 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v9] In-Reply-To: References: <5H6XV7Agl6ZNfGWT-bCbIPsimFTYM0pyIGiAHDQUUyA=.168e21cc-6cd8-42d8-ab59-d5e02e241ea2@github.com> <0RKnLUgc6UBtyxSyezCMWsSbP50hu6fQ6UJPHpGlgSU=.9fafa10f-62ee-4ec8-9093-4e204fcbe504@github.com> <5QbsVmYi0tYGlOvDL4LjJb1SjChIZtaWSMthFM9grMI=.0900e1c3-90b3-4726-a7c6-c2aff49d07ce@github.com> Message-ID: On Mon, 6 May 2024 11:56:52 GMT, Roland Westrelin wrote: >> @rwestrel I feel like I am heavily stepping on your toes now.... >> Can you please do refactoring in a separate prior PR? This change is now 3K+ lines, and even reading through it all takes me more than a day, I simply cannot commit this many hours at a time. >> >> I'm thinking in particular about your most recent changes with: >> - `class Invariance` >> - `estimate_if_peeling_possible` >> >> Don't get me wrong: I like those refactorings, but they should be done separately. >> >> If you can find anything else that could be done separately, that would help greatly. >> >> I have been painstakingly separating my SuperWord PR's into more reviewable patches, and I do get quicker reviews that way. >> >> My concern: I think the code is now in a state that can be understood (if one spends a day reading it all), but it is hard for me to say that it is correct. If I now approve this patch, then a subsequent reviewer will pay less attention, hence, I feel like I cannot just approve it too quickly now. >> >> If I am too annoying, feel free to ask someone else to review and I will just step back. Maybe @theRealAph wants to review for a while? > >> I'm thinking in particular about your most recent changes with: >> >> * `class Invariance` >> >> * `estimate_if_peeling_possible` >> >> >> Don't get me wrong: I like those refactorings, but they should be done separately. > > The problem I see is that they have little value unless this patch is integrated as it is. What if another reviewer thinks it's better to keep everything related to loop predication together? There's no need to change the class `Invariance` then. > @rwestrel one idea to split things here: > > * Early inline of ScopedValue methods > > * Parsing to IR nodes and expansion back. > > * Optimization > > * Tests > > > This way, I can spend only a few hours on one at a time, and we can get this done. Of course, you cannot really integrate an individual one, maybe there is a way to use skara for dependent PR's? Would one commit per line above work? Or do you think it needs to be different PRs? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2095855714 From epeter at openjdk.org Mon May 6 12:21:01 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 6 May 2024 12:21:01 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v9] In-Reply-To: References: <5H6XV7Agl6ZNfGWT-bCbIPsimFTYM0pyIGiAHDQUUyA=.168e21cc-6cd8-42d8-ab59-d5e02e241ea2@github.com> <0RKnLUgc6UBtyxSyezCMWsSbP50hu6fQ6UJPHpGlgSU=.9fafa10f-62ee-4ec8-9093-4e204fcbe504@github.com> <5QbsVmYi0tYGlOvDL4LjJb1SjChIZtaWSMthFM9grMI=.0900e1c3-90b3-4726-a7c6-c2aff49d07ce@github.com> Message-ID: <99M7Sb0E8z_DOcJ54d5LiJbIeWX0AQj7Ypmg_TNsQZ0=.0b35ec61-ba5a-456f-b597-d43a71ad8095@github.com> On Mon, 6 May 2024 12:03:12 GMT, Roland Westrelin wrote: >>> I'm thinking in particular about your most recent changes with: >>> >>> * `class Invariance` >>> >>> * `estimate_if_peeling_possible` >>> >>> >>> Don't get me wrong: I like those refactorings, but they should be done separately. >> >> The problem I see is that they have little value unless this patch is integrated as it is. What if another reviewer thinks it's better to keep everything related to loop predication together? There's no need to change the class `Invariance` then. > >> @rwestrel one idea to split things here: >> >> * Early inline of ScopedValue methods >> >> * Parsing to IR nodes and expansion back. >> >> * Optimization >> >> * Tests >> >> >> This way, I can spend only a few hours on one at a time, and we can get this done. Of course, you cannot really integrate an individual one, maybe there is a way to use skara for dependent PR's? > > Would one commit per line above work? Or do you think it needs to be different PRs? @rwestrel Just using commits is probably not really helpful. What would you do if there needs to be an update to commit 1, requested by a reviewer? Honestly, I would like to take a break from this for now. I leave it up to you how to present it in a way that is easier to review. Once you get someone to review and accept it, I can see if I find time to review again. I think the code is significantly better/readable than when we first started. So if someone like @vnkozlov simply scans and approves it, and as such takes the responsibility of "first reviewer", then I'm totally fine with that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2095886779 From vlivanov at openjdk.org Mon May 6 12:23:58 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 6 May 2024 12:23:58 GMT Subject: RFR: 8322726: C2: Unloaded signature class kills argument value In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 11:35:25 GMT, Vladimir Ivanov wrote: > For MethodHandle linkers all arguments are casted to signature classes when target method is known. > > It causes problems when target method signature contains unloaded classes: when loaded class meets unloaded class it turns into a TOP. It effectively kills argument values which correspond to unloaded signature types. > > Proposed fix avoids casts when signature class is unloaded. > > Testing: hs-tier1 - hs-tier4 Thanks for the reviews, Vladimir K, Dean, and Tobias. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18973#issuecomment-2095890793 From vlivanov at openjdk.org Mon May 6 12:23:59 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 6 May 2024 12:23:59 GMT Subject: Integrated: 8322726: C2: Unloaded signature class kills argument value In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 11:35:25 GMT, Vladimir Ivanov wrote: > For MethodHandle linkers all arguments are casted to signature classes when target method is known. > > It causes problems when target method signature contains unloaded classes: when loaded class meets unloaded class it turns into a TOP. It effectively kills argument values which correspond to unloaded signature types. > > Proposed fix avoids casts when signature class is unloaded. > > Testing: hs-tier1 - hs-tier4 This pull request has now been integrated. Changeset: fa02667d Author: Vladimir Ivanov URL: https://git.openjdk.org/jdk/commit/fa02667d838f08cac7d41dfb4c3e8056ae6165cc Stats: 181 lines in 5 files changed: 168 ins; 0 del; 13 mod 8322726: C2: Unloaded signature class kills argument value Reviewed-by: kvn, dlong, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/18973 From tholenstein at openjdk.org Mon May 6 13:15:16 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 6 May 2024 13:15:16 GMT Subject: RFR: 8330584: IGV: XML does not save all node properties Message-ID: When C2 sends graphs over the network to IGV, each graph is sent separately. The same applies if C2 saves graphs to XML: each graph is saved with all it's nodes as a separate `...` in the XML To save space, graphs that are saved from IGV only contains the incremental difference for each graph. This saves a lot of space (~5-10x). The logic happens in Printer.java -> `exportInputGraph(.., difference=true, ...)` Unfortunately, there is a bug in this logic: the properties of the nodes are not saved correctly. [graphs.zip](https://github.com/openjdk/jdk/files/15220940/graphs.zip) contains 4 graphs: `graph_c2.xml` (230KB) - a XML saved from C2 `graph_igv_bug.xml` (73KB) - opened `graph_c2.xml` in IGV (without this fix) and save as `graph_igv_bug.xml`. `graph_igv_fixed.xml` (123KB) - opened `graph_c2.xml` in IGV (with this fix) and save as `graph_igv_fixed.xml `. As you can see `graph_igv_fixed.xml` is twice as large as `graph_igv_bug.xml` because it contains the missing properties. But now the memory saving from the original `graph_c2.xml` is only ~2x. Therefore a new format for saving is added: graphs can now be saved and opened from IGV as `.igv`. This uses a compressed (ZIP) format. `graph.igv` (10KB) is the same graph as `graph_c2.xml` (230KB). But it uses difference graph compression and ZIP compression and is in total 23x smaller in memory footprint. E.g. The root in the last graph of difference_true.xml has way less properties than in difference_false.xml. ------------- Commit messages: - Update InputNode.java - compressed graphs as .igv files - JDK-8330584 IGV: XML does not save all node properties Changes: https://git.openjdk.org/jdk/pull/19104/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19104&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330584 Stats: 147 lines in 3 files changed: 79 ins; 16 del; 52 mod Patch: https://git.openjdk.org/jdk/pull/19104.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19104/head:pull/19104 PR: https://git.openjdk.org/jdk/pull/19104 From asotona at openjdk.org Mon May 6 13:59:19 2024 From: asotona at openjdk.org (Adam Sotona) Date: Mon, 6 May 2024 13:59:19 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v5] In-Reply-To: References: Message-ID: > Hi, > During performance optimization work on Class-File API as JDK lambda generator we found some static initialization killers. > One of them is `java.lang.classfile.Attributes` with tens of static fields initialized with individual attribute mappers, and common set of all mappers, and static map from attribute names to the mappers. > > I propose to turn all the static fields into lazy-initialized static methods and remove `PREDEFINED_ATTRIBUTES` and `standardAttribute(Utf8Entry name)` static mapping method from the `Attributes` API class. > > Please let me know your comments or objections and please review the [PR](https://github.com/openjdk/jdk/pull/19006) and [CSR](https://bugs.openjdk.org/browse/JDK-8331414), so we can make it into 23. > > Thank you, > Adam Adam Sotona has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: - updated LimitsTest - Merge branch 'master' into JDK-8331291-attributes # Conflicts: # test/jdk/jdk/classfile/SignaturesTest.java - Merge branch 'master' into JDK-8331291-attributes - changed order in allowed modules attributes check - added bug number - added impl comment - removed list of predefined attributes standard attributes mapping hard-coded and moved to BoundAttribute added AttributesTest::testAttributesMapping - move mappers implementations to AbstractAttributeMapper - 8331291: java.lang.classfile.Attributes class performs a lot of static initializations ------------- Changes: https://git.openjdk.org/jdk/pull/19006/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19006&range=04 Stats: 2032 lines in 48 files changed: 905 ins; 619 del; 508 mod Patch: https://git.openjdk.org/jdk/pull/19006.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19006/head:pull/19006 PR: https://git.openjdk.org/jdk/pull/19006 From chagedorn at openjdk.org Mon May 6 14:24:27 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 6 May 2024 14:24:27 GMT Subject: RFR: 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode [v2] In-Reply-To: References: Message-ID: > This patch replaces the `Opaque4Node` of the `If` for Initialized Assertion Predicates with a new `OpaqueInitializedAsseritonPredicateNode`. This helps to simplify pattern matching for predicate code and to distinguish from the two other uses of `Opaque4` nodes: > 1. Template Assertion Predicate: The goal is to get rid of its `Opaque4Node` as well by using a dedicated `TemplateAssertionPredicateNode` for the `IfNode`. > 2. Non-null-checks with instrinsics and unsafe accesses: This will eventually be the only use left. Once we get there, we should rename the node accordingly to `OpaqueNonNullCheck` or something like that. > > I went through all the uses of `Opaque4` nodes and did the following: > - Could the `Opaque4` node be part of an Initialized Assertion Predicate? > - No: Added an assert that we are not dealing with an Initialized Assertion Predicate. > - Yes: > - Yes **and only** for Initialized Assertion Predicates? Added an assert that we are only expecting an `OpaqueInitializedAsseritonPredicateNode` if appropriate. > - Yes but could also be something else: Added case for `OpaqueInitializedAsseritonPredicateNode` next to the `Opaque4` case. > - Is this `Opaque4` node only used for Template Assertion Predicates? > - Yes: Added assert with call to `assertion_predicate_has_loop_opaque_node()` to check that we find its `OpaqueLoop*Nodes`. > - I've added test cases where I was not sure about whether an `Opaque4` node could be part of a Template, an Initialized Assertion Predicate or a non-null-check. This was a little tricky but I think it was still worth to prevent future bugs (even though most of these special cases are quite rare). > > This is another patch split off from the full fix for Assertion Predicates. > > Thanks, > Christian Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into JDK-8330386 - Add more comments and asserts - Add more tests - 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18951/files - new: https://git.openjdk.org/jdk/pull/18951/files/089a4e65..fe3feb8b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18951&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18951&range=00-01 Stats: 22254 lines in 1611 files changed: 8283 ins; 8470 del; 5501 mod Patch: https://git.openjdk.org/jdk/pull/18951.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18951/head:pull/18951 PR: https://git.openjdk.org/jdk/pull/18951 From dfenacci at openjdk.org Mon May 6 15:20:32 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 6 May 2024 15:20:32 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v6] In-Reply-To: References: Message-ID: > # Issue > When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. > > # Causes > On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. > The same is true for `StoreVector`s. > When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 > > where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor offsets interfere. > Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > but we don?t make sure that there are no masks or offsets. > A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and offsets here too but in this case we can include these cases if the masks and offsets of the vector stores are equivalent. > > # Solution > To avoid folding `Load`- and `StoreVector`s with masks and offsets we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns its own (to avoid ever being the same as a store node). In this way, the checks in `MemNode::can_see_stored_value` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 > > and `StoreNode::Identity` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > will fail if masks or offsets are used. > For 2 stores of the same value we instead check for mask and offset equality. > > Regression tests for all versions of `Load/StoreVectorGather/Masked` have been add... Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: - JDK-8325520: add store/load masked vector tests - JDK-8325520: add store/load tests with duplicate offsets ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18347/files - new: https://git.openjdk.org/jdk/pull/18347/files/524ff888..85bb4bef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=04-05 Stats: 95 lines in 1 file changed: 94 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18347.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18347/head:pull/18347 PR: https://git.openjdk.org/jdk/pull/18347 From dfenacci at openjdk.org Mon May 6 15:23:00 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 6 May 2024 15:23:00 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v4] In-Reply-To: References: Message-ID: On Thu, 25 Apr 2024 12:21:43 GMT, Emanuel Peter wrote: > It would be great if you had tests that exactly exercise these "bad" examples, where it looks like we might optimize, but it would be wrong. Yep, good idea. I've added a few tests to check for those cases (load-store with duplicate offsets and store-load with masks). Thanks @eme64! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18347#issuecomment-2096287127 From asotona at openjdk.org Mon May 6 15:59:08 2024 From: asotona at openjdk.org (Adam Sotona) Date: Mon, 6 May 2024 15:59:08 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v6] In-Reply-To: References: Message-ID: <8b638nkCvzhpf1xUCK-KGXVXqeYPwzFkVOJPOFDtyd4=.50d86a2b-a695-49d5-8de6-924b41f507f5@github.com> > Hi, > During performance optimization work on Class-File API as JDK lambda generator we found some static initialization killers. > One of them is `java.lang.classfile.Attributes` with tens of static fields initialized with individual attribute mappers, and common set of all mappers, and static map from attribute names to the mappers. > > I propose to turn all the static fields into lazy-initialized static methods and remove `PREDEFINED_ATTRIBUTES` and `standardAttribute(Utf8Entry name)` static mapping method from the `Attributes` API class. > > Please let me know your comments or objections and please review the [PR](https://github.com/openjdk/jdk/pull/19006) and [CSR](https://bugs.openjdk.org/browse/JDK-8331414), so we can make it into 23. > > Thank you, > Adam Adam Sotona has updated the pull request incrementally with one additional commit since the last revision: fixed tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19006/files - new: https://git.openjdk.org/jdk/pull/19006/files/497dd533..a1a55d71 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19006&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19006&range=04-05 Stats: 180 lines in 94 files changed: 0 ins; 0 del; 180 mod Patch: https://git.openjdk.org/jdk/pull/19006.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19006/head:pull/19006 PR: https://git.openjdk.org/jdk/pull/19006 From asotona at openjdk.org Mon May 6 15:59:08 2024 From: asotona at openjdk.org (Adam Sotona) Date: Mon, 6 May 2024 15:59:08 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v4] In-Reply-To: References: Message-ID: <5agtRoM-ozF1_jEnCOI4j9tvcEJEhul2FSDxHX8hEAE=.d2c1fe74-e84f-4007-9d39-57901b1788e2@github.com> On Thu, 2 May 2024 14:40:16 GMT, Chen Liang wrote: > On a side note, will we update JEP 466 to include this patch? I hope so, if we get it into 23 ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19006#issuecomment-2096378934 From asotona at openjdk.org Mon May 6 15:59:08 2024 From: asotona at openjdk.org (Adam Sotona) Date: Mon, 6 May 2024 15:59:08 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v4] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 11:16:16 GMT, Claes Redestad wrote: > FWIW code changes looks good to me. There seems to be a number of tests that still need to be updated to use the new methods instead of the old constants. Thank you! Yes, I'm cleaning the tests right now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19006#issuecomment-2096380853 From asotona at openjdk.org Mon May 6 16:07:26 2024 From: asotona at openjdk.org (Adam Sotona) Date: Mon, 6 May 2024 16:07:26 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v7] In-Reply-To: References: Message-ID: <_5Ike3ZDfok-lU5AItq7mDu80Gme4vvRrmvpovOOXHg=.763c4a63-7dff-49f5-b826-93d727e9f5b9@github.com> > Hi, > During performance optimization work on Class-File API as JDK lambda generator we found some static initialization killers. > One of them is `java.lang.classfile.Attributes` with tens of static fields initialized with individual attribute mappers, and common set of all mappers, and static map from attribute names to the mappers. > > I propose to turn all the static fields into lazy-initialized static methods and remove `PREDEFINED_ATTRIBUTES` and `standardAttribute(Utf8Entry name)` static mapping method from the `Attributes` API class. > > Please let me know your comments or objections and please review the [PR](https://github.com/openjdk/jdk/pull/19006) and [CSR](https://bugs.openjdk.org/browse/JDK-8331414), so we can make it into 23. > > Thank you, > Adam Adam Sotona has updated the pull request incrementally with one additional commit since the last revision: fixed tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19006/files - new: https://git.openjdk.org/jdk/pull/19006/files/a1a55d71..dcbaae85 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19006&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19006&range=05-06 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19006.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19006/head:pull/19006 PR: https://git.openjdk.org/jdk/pull/19006 From kvn at openjdk.org Mon May 6 17:02:58 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 6 May 2024 17:02:58 GMT Subject: RFR: 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode [v2] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 14:24:27 GMT, Christian Hagedorn wrote: >> This patch replaces the `Opaque4Node` of the `If` for Initialized Assertion Predicates with a new `OpaqueInitializedAsseritonPredicateNode`. This helps to simplify pattern matching for predicate code and to distinguish from the two other uses of `Opaque4` nodes: >> 1. Template Assertion Predicate: The goal is to get rid of its `Opaque4Node` as well by using a dedicated `TemplateAssertionPredicateNode` for the `IfNode`. >> 2. Non-null-checks with instrinsics and unsafe accesses: This will eventually be the only use left. Once we get there, we should rename the node accordingly to `OpaqueNonNullCheck` or something like that. >> >> I went through all the uses of `Opaque4` nodes and did the following: >> - Could the `Opaque4` node be part of an Initialized Assertion Predicate? >> - No: Added an assert that we are not dealing with an Initialized Assertion Predicate. >> - Yes: >> - Yes **and only** for Initialized Assertion Predicates? Added an assert that we are only expecting an `OpaqueInitializedAsseritonPredicateNode` if appropriate. >> - Yes but could also be something else: Added case for `OpaqueInitializedAsseritonPredicateNode` next to the `Opaque4` case. >> - Is this `Opaque4` node only used for Template Assertion Predicates? >> - Yes: Added assert with call to `assertion_predicate_has_loop_opaque_node()` to check that we find its `OpaqueLoop*Nodes`. >> - I've added test cases where I was not sure about whether an `Opaque4` node could be part of a Template, an Initialized Assertion Predicate or a non-null-check. This was a little tricky but I think it was still worth to prevent future bugs (even though most of these special cases are quite rare). >> >> This is another patch split off from the full fix for Assertion Predicates. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into JDK-8330386 > - Add more comments and asserts > - Add more tests > - 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode Looks reasonable. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18951#pullrequestreview-2041258643 From kvn at openjdk.org Mon May 6 17:17:52 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 6 May 2024 17:17:52 GMT Subject: RFR: 8329273: C2 SuperWord: Some basic MemorySegment IR tests In-Reply-To: References: Message-ID: On Thu, 28 Mar 2024 16:34:38 GMT, Emanuel Peter wrote: > I could not find any IR vectorization tests for `MemorySegment` yet. > > I make sure to exercise different backing types: > - arrays > - buffers > - native memory > > I filed a follow-up RFE, to eventually make all cases where I have "FAILS" vectorize: > > [JDK-8331659](https://bugs.openjdk.org/browse/JDK-8331659): C2 SuperWord: investicate failed vectorization in compiler/loopopts/superword/TestMemorySegment.java Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18535#pullrequestreview-2041284796 From asotona at openjdk.org Mon May 6 18:24:25 2024 From: asotona at openjdk.org (Adam Sotona) Date: Mon, 6 May 2024 18:24:25 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v8] In-Reply-To: References: Message-ID: > Hi, > During performance optimization work on Class-File API as JDK lambda generator we found some static initialization killers. > One of them is `java.lang.classfile.Attributes` with tens of static fields initialized with individual attribute mappers, and common set of all mappers, and static map from attribute names to the mappers. > > I propose to turn all the static fields into lazy-initialized static methods and remove `PREDEFINED_ATTRIBUTES` and `standardAttribute(Utf8Entry name)` static mapping method from the `Attributes` API class. > > Please let me know your comments or objections and please review the [PR](https://github.com/openjdk/jdk/pull/19006) and [CSR](https://bugs.openjdk.org/browse/JDK-8331414), so we can make it into 23. > > Thank you, > Adam Adam Sotona has updated the pull request incrementally with one additional commit since the last revision: fixed tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19006/files - new: https://git.openjdk.org/jdk/pull/19006/files/dcbaae85..b4203cfd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19006&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19006&range=06-07 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19006.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19006/head:pull/19006 PR: https://git.openjdk.org/jdk/pull/19006 From vromero at openjdk.org Mon May 6 18:35:57 2024 From: vromero at openjdk.org (Vicente Romero) Date: Mon, 6 May 2024 18:35:57 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v7] In-Reply-To: <_5Ike3ZDfok-lU5AItq7mDu80Gme4vvRrmvpovOOXHg=.763c4a63-7dff-49f5-b826-93d727e9f5b9@github.com> References: <_5Ike3ZDfok-lU5AItq7mDu80Gme4vvRrmvpovOOXHg=.763c4a63-7dff-49f5-b826-93d727e9f5b9@github.com> Message-ID: On Mon, 6 May 2024 16:07:26 GMT, Adam Sotona wrote: >> Hi, >> During performance optimization work on Class-File API as JDK lambda generator we found some static initialization killers. >> One of them is `java.lang.classfile.Attributes` with tens of static fields initialized with individual attribute mappers, and common set of all mappers, and static map from attribute names to the mappers. >> >> I propose to turn all the static fields into lazy-initialized static methods and remove `PREDEFINED_ATTRIBUTES` and `standardAttribute(Utf8Entry name)` static mapping method from the `Attributes` API class. >> >> Please let me know your comments or objections and please review the [PR](https://github.com/openjdk/jdk/pull/19006) and [CSR](https://bugs.openjdk.org/browse/JDK-8331414), so we can make it into 23. >> >> Thank you, >> Adam > > Adam Sotona has updated the pull request incrementally with one additional commit since the last revision: > > fixed tests lgtm src/java.base/share/classes/java/lang/classfile/Attributes.java line 28: > 26: > 27: import java.lang.classfile.attribute.*; > 28: import jdk.internal.classfile.impl.AbstractAttributeMapper.*; the second star import is probably unnecessary ------------- Marked as reviewed by vromero (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19006#pullrequestreview-2041378994 PR Review Comment: https://git.openjdk.org/jdk/pull/19006#discussion_r1591377928 From asotona at openjdk.org Mon May 6 18:46:54 2024 From: asotona at openjdk.org (Adam Sotona) Date: Mon, 6 May 2024 18:46:54 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v7] In-Reply-To: References: <_5Ike3ZDfok-lU5AItq7mDu80Gme4vvRrmvpovOOXHg=.763c4a63-7dff-49f5-b826-93d727e9f5b9@github.com> Message-ID: <2mvx1CG4RndVRqr8H_uypWn0S97bZ1qXXTWvVFsESz0=.23f7434a-d6ea-4208-9d49-d03f07c9e9b3@github.com> On Mon, 6 May 2024 18:07:06 GMT, Vicente Romero wrote: >> Adam Sotona has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed tests > > src/java.base/share/classes/java/lang/classfile/Attributes.java line 28: > >> 26: >> 27: import java.lang.classfile.attribute.*; >> 28: import jdk.internal.classfile.impl.AbstractAttributeMapper.*; > > the second star import is probably unnecessary Thank you for the review! All the holders/mappers implementations are AbstractAttributeMapper inner classes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19006#discussion_r1591416140 From cslucas at openjdk.org Mon May 6 21:08:01 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 6 May 2024 21:08:01 GMT Subject: RFR: JDK-8330565 - C2: Multiple crashes with CTW after JDK-8316991 Message-ID: Please consider this patch for fixing issues described in JDK-8330565 with a little overlap with issue JDK-8330795. The `# assert(false) failed: Bad graph detected in build_loop_late` failure was caused because a string concatenation optimization using [this method](https://github.com/openjdk/jdk/blob/819f3d6fc70ff6fe54ac5f9033c17c3dd4326aa5/src/hotspot/share/opto/graphKit.cpp#L4115) adds AddP and LoadN nodes to IR graph as NotNull _and_ because RAM was not "nullyfing" phis merging nullable pointers. I was only able to reproduce this problem using a classfile/jar compiled using an "old" version of JDK.. because newer version use InvokeDynamic to do string concatenation. The `assert(adr_t->is_known_instance_field()) failed: instance required` failure was caused because RAM uses `PhaseMacroExpand::can_eliminate_allocation` to check if an allocation can be eliminated and that method wasn't checking that the allocation uses an exact type or not. The `assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type` failure was caused by the fact that we didn't have a "zero value" for the type T_METADATA. The RAM patch uses that data when it creates a Phi node merging Klass loads and UseCompressedClassPointers is disabled. Tested with JTREG tier1-4 on Linux x86_64 & ARM64. ------------- Commit messages: - Fix bad type when UseCompressedPointers is disabled. - Phi merging nullable inputs needs to be nullable. - SR allocate needs to be of exact type. Changes: https://git.openjdk.org/jdk/pull/19111/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19111&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330565 Stats: 20 lines in 3 files changed: 20 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19111.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19111/head:pull/19111 PR: https://git.openjdk.org/jdk/pull/19111 From kvn at openjdk.org Mon May 6 21:53:53 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 6 May 2024 21:53:53 GMT Subject: RFR: JDK-8330565 - C2: Multiple crashes with CTW after JDK-8316991 In-Reply-To: References: Message-ID: On Mon, 6 May 2024 21:02:07 GMT, Cesar Soares Lucas wrote: > Please consider this patch for fixing issues described in JDK-8330565 with a little overlap with issue JDK-8330795. > > The `# assert(false) failed: Bad graph detected in build_loop_late` failure was caused because a string concatenation optimization using [this method](https://github.com/openjdk/jdk/blob/819f3d6fc70ff6fe54ac5f9033c17c3dd4326aa5/src/hotspot/share/opto/graphKit.cpp#L4115) adds AddP and LoadN nodes to IR graph as NotNull _and_ because RAM was not "nullyfing" phis merging nullable pointers. I was only able to reproduce this problem using a classfile/jar compiled using an "old" version of JDK.. because newer version use InvokeDynamic to do string concatenation. > > The `assert(adr_t->is_known_instance_field()) failed: instance required` failure was caused because RAM uses `PhaseMacroExpand::can_eliminate_allocation` to check if an allocation can be eliminated and that method wasn't checking that the allocation uses an exact type or not. > > The `assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type` failure was caused by the fact that we didn't have a "zero value" for the type T_METADATA. The RAM patch uses that data when it creates a Phi node merging Klass loads and UseCompressedClassPointers is disabled. > > > > > Tested with JTREG tier1-4 on Linux x86_64 & ARM64. src/hotspot/share/opto/macro.cpp line 578: > 576: } else if (!res_type->klass_is_exact()) { > 577: NOT_PRODUCT(fail_eliminate = "Not an exact type.";) > 578: can_eliminate = false; You already fixed this: [#18851](https://github.com/openjdk/jdk/pull/18851) Please, merge latest changes into this PR. Also you need new regression tests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19111#discussion_r1591593729 From cslucas at openjdk.org Mon May 6 22:19:26 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 6 May 2024 22:19:26 GMT Subject: RFR: JDK-8330565 - C2: Multiple crashes with CTW after JDK-8316991 [v2] In-Reply-To: References: Message-ID: > Please consider this patch for fixing issues described in JDK-8330565 with a little overlap with issue JDK-8330795. > > The `# assert(false) failed: Bad graph detected in build_loop_late` failure was caused because a string concatenation optimization using [this method](https://github.com/openjdk/jdk/blob/819f3d6fc70ff6fe54ac5f9033c17c3dd4326aa5/src/hotspot/share/opto/graphKit.cpp#L4115) adds AddP and LoadN nodes to IR graph as NotNull _and_ because RAM was not "nullyfing" phis merging nullable pointers. I was only able to reproduce this problem using a classfile/jar compiled using an "old" version of JDK.. because newer version use InvokeDynamic to do string concatenation. > > The `assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type` failure was caused by the fact that we didn't have a "zero value" for the type T_METADATA. The RAM patch uses that data when it creates a Phi node merging Klass loads and UseCompressedClassPointers is disabled. > > > > > Tested with JTREG tier1-4 on Linux x86_64 & ARM64. Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Updating branch Merge branch 'fix_bad_bad_graph' of https://github.com/JohnTortugo/jdk into fix_bad_bad_graph - Fix bad type when UseCompressedPointers is disabled. - Phi merging nullable inputs needs to be nullable. - SR allocate needs to be of exact type. - Fix bad type when UseCompressedPointers is disabled. - Phi merging nullable inputs needs to be nullable. - SR allocate needs to be of exact type. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19111/files - new: https://git.openjdk.org/jdk/pull/19111/files/31829d60..c8ce1502 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19111&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19111&range=00-01 Stats: 48711 lines in 1737 files changed: 22525 ins; 21107 del; 5079 mod Patch: https://git.openjdk.org/jdk/pull/19111.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19111/head:pull/19111 PR: https://git.openjdk.org/jdk/pull/19111 From cslucas at openjdk.org Mon May 6 22:19:26 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 6 May 2024 22:19:26 GMT Subject: Withdrawn: JDK-8330565 - C2: Multiple crashes with CTW after JDK-8316991 In-Reply-To: References: Message-ID: On Mon, 6 May 2024 21:02:07 GMT, Cesar Soares Lucas wrote: > Please consider this patch for fixing issues described in JDK-8330565 with a little overlap with issue JDK-8330795. > > The `# assert(false) failed: Bad graph detected in build_loop_late` failure was caused because a string concatenation optimization using [this method](https://github.com/openjdk/jdk/blob/819f3d6fc70ff6fe54ac5f9033c17c3dd4326aa5/src/hotspot/share/opto/graphKit.cpp#L4115) adds AddP and LoadN nodes to IR graph as NotNull _and_ because RAM was not "nullyfing" phis merging nullable pointers. I was only able to reproduce this problem using a classfile/jar compiled using an "old" version of JDK.. because newer version use InvokeDynamic to do string concatenation. > > The `assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type` failure was caused by the fact that we didn't have a "zero value" for the type T_METADATA. The RAM patch uses that data when it creates a Phi node merging Klass loads and UseCompressedClassPointers is disabled. > > > > > Tested with JTREG tier1-4 on Linux x86_64 & ARM64. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/19111 From sviswanathan at openjdk.org Mon May 6 22:43:57 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 6 May 2024 22:43:57 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: <8Y-nIHc8vfB1X_hp3tpqqqgpCzu6dAt6BBIP_zc4Q70=.c9a48c68-8c14-4af9-8357-ab50e62a5fd3@github.com> On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Rearrange; add lambdas for clarity src/hotspot/cpu/x86/macroAssembler_x86.cpp line 1174: > 1172: // Alignment specifying the maximum number of allowed bytes to pad. > 1173: // If padding > max, no padding is inserted. > 1174: void MacroAssembler::p2align(int modulus, int maxbytes) { We could pass offset() as an argument to p2align. Basically have three arguments to p2align(modulus, target, maxbytes). Also maybe rename p2align as align then? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 208: > 206: //////////////////////////////////////////////////////////////////////////////////////// > 207: //////////////////////////////////////////////////////////////////////////////////////// > 208: if (VM_Version::supports_avx2()) { // AVX2 version Instead of the if check here, it would be better to do an assert here: assert (VM_Version::supports_avx2(), "Needs AVX2 support"); src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 233: > 231: //////////////////////////////////////////////////////////////////////////////////////// > 232: //////////////////////////////////////////////////////////////////////////////////////// > 233: This comment can go right before the method start. Also good to add in the comment the native function parameters. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 238: > 236: const Register needle = rdx; > 237: const Register needle_len = rcx; > 238: This is the calling convention on Linux. How is windows platform handled? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 260: > 258: // const XMMRegister save_rcx = xmm11; > 259: // const XMMRegister save_r8 = xmm12; > 260: This could be removed? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 279: > 277: fnptrs[isLL ? StrIntrinsicNode::LL > 278: : isUU ? StrIntrinsicNode::UU > 279: : StrIntrinsicNode::UL] = __ pc(); Could this not be simplified as: fnptrs[ae] = __ pc(); src/hotspot/share/opto/library_call.cpp line 1263: > 1261: if (result != nullptr) { > 1262: // The result is index relative to from_index if substring was found, -1 otherwise. > 1263: // Generate code which will fold into cmove. Any reason to remove this comment? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591547667 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591612417 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591613215 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591617528 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591607921 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591618222 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591554296 From bkilambi at openjdk.org Mon May 6 23:01:56 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 6 May 2024 23:01:56 GMT Subject: Integrated: 8331400: AArch64: Sync aarch64_vector.ad with aarch64_vector_ad.m4 In-Reply-To: References: Message-ID: On Fri, 3 May 2024 09:07:25 GMT, Bhavana Kilambi wrote: > This commit - [1] modified the aarch64_vector.ad directly. This patch includes that change in the aarch64_vector_ad.m4 file as well and generates the aarch64_vector.ad file from it. > > [1] https://github.com/openjdk/jdk/commit/185e711bfe4c4d013b56e867f85cfb4177b3a2cf This pull request has now been integrated. Changeset: f308e107 Author: Bhavana Kilambi Committer: Eric Liu URL: https://git.openjdk.org/jdk/commit/f308e107ce8b993641ee3d0a0d5d52bf5cd3b94e Stats: 10 lines in 2 files changed: 6 ins; 2 del; 2 mod 8331400: AArch64: Sync aarch64_vector.ad with aarch64_vector_ad.m4 Reviewed-by: aph, kvn, eliu ------------- PR: https://git.openjdk.org/jdk/pull/19077 From sviswanathan at openjdk.org Mon May 6 23:21:57 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 6 May 2024 23:21:57 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Rearrange; add lambdas for clarity src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 314: > 312: > 313: // needle_len is in elements, not bytes, for UTF-16 > 314: __ cmpq(needle_len, isUU ? OPT_NEEDLE_SIZE_MAX / 2 : OPT_NEEDLE_SIZE_MAX); OPT_NEEDLE_SIZE_MAX is an odd number (set to 5), should that have been an even number? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 329: > 327: //////////////////////////////////////////////////////////////////////////////////////// > 328: > 329: __ bind(L_begin); So far we have handled haystack <= 32 and needle_size <= 5 (?) in bytes. A high level algorithm description here is needed in comments to follow the code below. A description of what are the various paths in terms of haystack and needle sizes and how to reason the assembly code below and make sure that all the paths are taken care of. Also the abstraction level suddenly changes here to detailed code below instead of methods for the various paths. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591640551 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591646095 From stuefe at openjdk.org Tue May 7 04:27:12 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 7 May 2024 04:27:12 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v11] In-Reply-To: References: Message-ID: <0anrYmEFTzUaEynG83xqh3DlAygkKXw9BTxO982PkR4=.7a8d0d3d-168e-47eb-8385-79d4a9c46df3@github.com> > See [1] for previous discussions. > > We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. > > The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. > > Examples: > > This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` > > This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` > > > --- > > The patch: > > 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. > 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. > 3) Adapted and extended tests > > I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. > > > Tested: > > - manually on Mac m1 (debug and release) > - GHAs are running > - but Oracle will do more testing before this goes in > > [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: - remove debug output - Merge branch 'master' into compiler-default-limit - fix compiler.c2.TestFindNode again - merge master and fix conflicts - Remove unused variable - Remove accidental change to TestDeadPhiMergeMemLoop.java - fix copyrights - fix copyrights - another fix - fix accidental slip in of another test name - ... and 9 more: https://git.openjdk.org/jdk/compare/f308e107...61dc5952 ------------- Changes: https://git.openjdk.org/jdk/pull/18969/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18969&range=10 Stats: 166 lines in 7 files changed: 115 ins; 12 del; 39 mod Patch: https://git.openjdk.org/jdk/pull/18969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18969/head:pull/18969 PR: https://git.openjdk.org/jdk/pull/18969 From stuefe at openjdk.org Tue May 7 04:27:12 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 7 May 2024 04:27:12 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v10] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 18:29:20 GMT, Vladimir Kozlov wrote: > `-XX:CompileCommand=memstat,compiler.c2.TestFindNode::*,print` - leftover from debugging? I tend to leave debug output in, if its not too large, to speed up any follow-up fixes I need to do later. But then, I did not do it consistently anyway, so I removed the output. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2097419843 From epeter at openjdk.org Tue May 7 05:43:07 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 May 2024 05:43:07 GMT Subject: RFR: 8331085: Crash in MergePrimitiveArrayStores::is_compatible_store() Message-ID: In the `MergeStore` logic, I check the `adr_type()`. But in some rare cases this can be a `nullptr`, I did not expect that. Exampe: during IGVN, the address is dying, with TOP somewhere in the inputs. 1 Con === 0 [[ ]] #top 1022 AddP === _ 1 1 41 [[ 1019 1021 ]] !orig=539,[572] !jvms: Test::dMeth @ bci:223 (line 35) 1019 StoreI === 1128 827 1022 1020 [[ 1075 541 1073 574 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; Memory: @null !orig=574,1068 !jvms: Test::dMeth @ bci:227 (line 35) I now check for `nullptr`. ------------- Commit messages: - 8331085 Changes: https://git.openjdk.org/jdk/pull/19103/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19103&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331085 Stats: 64 lines in 2 files changed: 63 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19103/head:pull/19103 PR: https://git.openjdk.org/jdk/pull/19103 From thartmann at openjdk.org Tue May 7 05:59:52 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 7 May 2024 05:59:52 GMT Subject: RFR: 8331085: Crash in MergePrimitiveArrayStores::is_compatible_store() In-Reply-To: References: Message-ID: On Mon, 6 May 2024 11:25:22 GMT, Emanuel Peter wrote: > In the `MergeStore` logic, I check the `adr_type()`. But in some rare cases this can be a `nullptr`, I did not expect that. > > Exampe: during IGVN, the address is dying, with TOP somewhere in the inputs. > > 1 Con === 0 [[ ]] #top > 1022 AddP === _ 1 1 41 [[ 1019 1021 ]] !orig=539,[572] !jvms: Test::dMeth @ bci:223 (line 35) > 1019 StoreI === 1128 827 1022 1020 [[ 1075 541 1073 574 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; Memory: @null !orig=574,1068 !jvms: Test::dMeth @ bci:227 (line 35) > > I now check for `nullptr`. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19103#pullrequestreview-2042128740 From jbhateja at openjdk.org Tue May 7 06:12:58 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 7 May 2024 06:12:58 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v14] In-Reply-To: <727FyZHyBbtRilYRtbP2E4dbZYqj9a-QgXAuicQ2iZQ=.01035706-6591-4df5-bf7d-d7a2f6209015@github.com> References: <727FyZHyBbtRilYRtbP2E4dbZYqj9a-QgXAuicQ2iZQ=.01035706-6591-4df5-bf7d-d7a2f6209015@github.com> Message-ID: On Fri, 3 May 2024 19:14:07 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > revert unneeded legacy flag change for kmovwl(K,K) and kmovql(K,K) src/hotspot/cpu/x86/assembler_x86.cpp line 11754: > 11752: > 11753: // This is a 4 byte encoding > 11754: void Assembler::evex_prefix(bool vex_r, bool vex_b, bool vex_x, bool evex_r, bool evex_b, bool evex_v, Suggestion: void Assembler::evex_prefix(bool vex_r, bool vex_b, bool vex_x, bool evex_r, bool eevex_b, bool evex_v, src/hotspot/cpu/x86/assembler_x86.cpp line 11766: > 11764: // P0: byte 2, initialized to RXBR`00mm > 11765: // instead of not'd > 11766: int byte2 = (vex_r ? VEX_R : 0) | (vex_x ? VEX_X : 0) | (vex_b ? VEX_B : 0) | (evex_r ? EVEX_Rb : 0); Comment at [L#11765 ](https://github.com/openjdk/jdk/pull/18476/files#diff-e3576e9c22db89236cdb906f032ff00748ff6d1c21b05277d991d80af75daf3aL11686) `// P0: byte 2, initialized to RXBR'00mm => // P0: byte 2, initialized to RXBR'0mmm` src/hotspot/cpu/x86/assembler_x86.cpp line 11768: > 11766: int byte2 = (vex_r ? VEX_R : 0) | (vex_x ? VEX_X : 0) | (vex_b ? VEX_B : 0) | (evex_r ? EVEX_Rb : 0); > 11767: byte2 = (~byte2) & 0xF0; > 11768: byte2 |= evex_b ? EEVEX_B : 0; Suggestion: byte2 |= eevex_b ? EEVEX_B : 0; This corresponds to B4 bit which is specific to EEVEX encoding. src/hotspot/cpu/x86/assembler_x86.cpp line 11846: > 11844: } > 11845: bool eevex_x = adr.index_needs_rex2(); > 11846: bool evex_b = adr.base_needs_rex2(); Suggestion: bool eevex_b = adr.base_needs_rex2(); src/hotspot/cpu/x86/assembler_x86.cpp line 11848: > 11846: bool evex_b = adr.base_needs_rex2(); > 11847: attributes->set_is_evex_instruction(); > 11848: evex_prefix(vex_r, vex_b, vex_x, evex_r, evex_b, evex_v, eevex_x, nds_enc, pre, opc); Suggestion: evex_prefix(vex_r, vex_b, vex_x, evex_r, eevex_b, evex_v, eevex_x, nds_enc, pre, opc); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1591847091 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1591858904 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1591846721 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1591848768 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1591848945 From chagedorn at openjdk.org Tue May 7 06:25:52 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 7 May 2024 06:25:52 GMT Subject: RFR: 8331085: Crash in MergePrimitiveArrayStores::is_compatible_store() In-Reply-To: References: Message-ID: On Mon, 6 May 2024 11:25:22 GMT, Emanuel Peter wrote: > In the `MergeStore` logic, I check the `adr_type()`. But in some rare cases this can be a `nullptr`, I did not expect that. > > Exampe: during IGVN, the address is dying, with TOP somewhere in the inputs. > > 1 Con === 0 [[ ]] #top > 1022 AddP === _ 1 1 41 [[ 1019 1021 ]] !orig=539,[572] !jvms: Test::dMeth @ bci:223 (line 35) > 1019 StoreI === 1128 827 1022 1020 [[ 1075 541 1073 574 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; Memory: @null !orig=574,1068 !jvms: Test::dMeth @ bci:227 (line 35) > > I now check for `nullptr`. Looks good! test/hotspot/jtreg/compiler/c2/TestMergeStoresNullAdrType.java line 33: > 31: * -XX:-TieredCompilation -Xcomp > 32: * -XX:+UnlockDiagnosticVMOptions -XX:+StressIGVN -XX:+StressCCP > 33: * -XX:RepeatCompilation=1000 Is it really worth to have such a high count? Eventually, it would trigger the bug if the test is executed enough times. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19103#pullrequestreview-2042163459 PR Review Comment: https://git.openjdk.org/jdk/pull/19103#discussion_r1591872327 From duke at openjdk.org Tue May 7 06:26:04 2024 From: duke at openjdk.org (Daniel Skantz) Date: Tue, 7 May 2024 06:26:04 GMT Subject: RFR: 8330016: Stress seed should be initialized for runtime stub compilation [v2] In-Reply-To: <-Pj7vK6Wpgv3FnP2KZXB2so2wWraG3LWEKds9XDI3pY=.d6595586-aa83-47bc-84ed-ff8c4a1a5550@github.com> References: <-Pj7vK6Wpgv3FnP2KZXB2so2wWraG3LWEKds9XDI3pY=.d6595586-aa83-47bc-84ed-ff8c4a1a5550@github.com> Message-ID: > We can initialize the stress seed for runtime stub compilation as we already do for method compilation. This found the bug described in JDK-8329258. It would apply if StressGCM or StressLCM vm flags are set. > > Testing: T1-5 default options. T1-5 with -XX:+StressLCM and -XX:+StressGCM. Manually tested that the stress seed is set and printed to compilation log if either stress option is set. Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: const ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19095/files - new: https://git.openjdk.org/jdk/pull/19095/files/6ccef597..a018fde0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19095&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19095&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19095.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19095/head:pull/19095 PR: https://git.openjdk.org/jdk/pull/19095 From epeter at openjdk.org Tue May 7 07:10:59 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 May 2024 07:10:59 GMT Subject: RFR: 8331085: Crash in MergePrimitiveArrayStores::is_compatible_store() In-Reply-To: References: Message-ID: On Tue, 7 May 2024 06:23:41 GMT, Christian Hagedorn wrote: >> In the `MergeStore` logic, I check the `adr_type()`. But in some rare cases this can be a `nullptr`, I did not expect that. >> >> Exampe: during IGVN, the address is dying, with TOP somewhere in the inputs. >> >> 1 Con === 0 [[ ]] #top >> 1022 AddP === _ 1 1 41 [[ 1019 1021 ]] !orig=539,[572] !jvms: Test::dMeth @ bci:223 (line 35) >> 1019 StoreI === 1128 827 1022 1020 [[ 1075 541 1073 574 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; Memory: @null !orig=574,1068 !jvms: Test::dMeth @ bci:227 (line 35) >> >> I now check for `nullptr`. > > Looks good! Thanks @chhagedorn @TobiHartmann for the reviews! Since this is rather a simple fix and creates a bit of noise in the testing pipeline, I'm already integrating now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19103#issuecomment-2097598560 From epeter at openjdk.org Tue May 7 07:11:01 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 May 2024 07:11:01 GMT Subject: Integrated: 8331085: Crash in MergePrimitiveArrayStores::is_compatible_store() In-Reply-To: References: Message-ID: <_6IPKwm8rYrQGoHgEVMl1dHlguKkHKXiFAWH_g3ZukU=.c2d01173-0f7a-4f5d-90bf-3ffac849e07d@github.com> On Mon, 6 May 2024 11:25:22 GMT, Emanuel Peter wrote: > In the `MergeStore` logic, I check the `adr_type()`. But in some rare cases this can be a `nullptr`, I did not expect that. > > Exampe: during IGVN, the address is dying, with TOP somewhere in the inputs. > > 1 Con === 0 [[ ]] #top > 1022 AddP === _ 1 1 41 [[ 1019 1021 ]] !orig=539,[572] !jvms: Test::dMeth @ bci:223 (line 35) > 1019 StoreI === 1128 827 1022 1020 [[ 1075 541 1073 574 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; Memory: @null !orig=574,1068 !jvms: Test::dMeth @ bci:227 (line 35) > > I now check for `nullptr`. This pull request has now been integrated. Changeset: df1ff056 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/df1ff056f19ce569e05b0b87584e289840fc5d5c Stats: 64 lines in 2 files changed: 63 ins; 0 del; 1 mod 8331085: Crash in MergePrimitiveArrayStores::is_compatible_store() Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/19103 From rcastanedalo at openjdk.org Tue May 7 07:27:52 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 7 May 2024 07:27:52 GMT Subject: RFR: 8330016: Stress seed should be initialized for runtime stub compilation [v2] In-Reply-To: References: <-Pj7vK6Wpgv3FnP2KZXB2so2wWraG3LWEKds9XDI3pY=.d6595586-aa83-47bc-84ed-ff8c4a1a5550@github.com> Message-ID: On Tue, 7 May 2024 06:26:04 GMT, Daniel Skantz wrote: >> We can initialize the stress seed for runtime stub compilation as we already do for method compilation. This found the bug described in JDK-8329258. It would apply if StressGCM or StressLCM vm flags are set. >> >> Testing: T1-5 default options. T1-5 with -XX:+StressLCM and -XX:+StressGCM. Manually tested that the stress seed is set and printed to compilation log if either stress option is set. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > const Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19095#pullrequestreview-2042291021 From duke at openjdk.org Tue May 7 09:08:55 2024 From: duke at openjdk.org (Daniel Skantz) Date: Tue, 7 May 2024 09:08:55 GMT Subject: RFR: 8330016: Stress seed should be initialized for runtime stub compilation [v2] In-Reply-To: References: <-Pj7vK6Wpgv3FnP2KZXB2so2wWraG3LWEKds9XDI3pY=.d6595586-aa83-47bc-84ed-ff8c4a1a5550@github.com> Message-ID: <1bLGo6lP0M20Q86lj4d3EsYy0SSlpgqeRiivad8PRfo=.4ba1d788-5e47-4692-96cd-cec61faec6df@github.com> On Tue, 7 May 2024 06:26:04 GMT, Daniel Skantz wrote: >> We can initialize the stress seed for runtime stub compilation as we already do for method compilation. This found the bug described in JDK-8329258. It would apply if StressGCM or StressLCM vm flags are set. >> >> Testing: T1-5 default options. T1-5 with -XX:+StressLCM and -XX:+StressGCM. Manually tested that the stress seed is set and printed to compilation log if either stress option is set. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > const Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19095#issuecomment-2097810711 From snazarki at openjdk.org Tue May 7 10:00:52 2024 From: snazarki at openjdk.org (Sergey Nazarkin) Date: Tue, 7 May 2024 10:00:52 GMT Subject: RFR: 8330806: test/hotspot/jtreg/compiler/c1/TestLargeMonitorOffset.java fails on ARM32 In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 14:21:09 GMT, Aleksei Voitylov wrote: > TestLargeMonitorOffset was introduced by 8310844 with a fix for the AArch64 platform. The same issue needs to be fixed for ARM32. With this change, we add the large slot_offset handling to the ARM32 version of IR_Assembler::osr_entry(). > > Testing: jtreg hotspot, jtreg jdk tier1-3. I've checked the patch (one may need to use a [workaround](https://bugs.openjdk.org/browse/JDK-8316395) ). The JDK crashes without the patch, and passes with the patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18891#issuecomment-2097916569 From epeter at openjdk.org Tue May 7 11:15:14 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 May 2024 11:15:14 GMT Subject: RFR: 8331764: C2 SuperWord: refactor _align_to_ref/_mem_ref_for_main_loop_alignment Message-ID: This PR accomplishes these things: - Rename `_align_to_ref` -> `_mem_ref_for_main_loop_alignment`. - Move the `mem_ref` finding for alignment out of `SuperWord::find_adjacent_refs`. This is too early, and we don't even know if the relevant `mem_ref` is going to be vectorized. It makes more sense to pick a `mem_ref` directly in `SuperWord::adjust_pre_loop_limit_to_align_main_loop_vectors`, where we already know what packs are going to be vectorized. - For the alignment width (aw), we can use the `vector_width` of the pack to which the `mem_ref` belongs, rather than the potentially much larger `vector_width_in_bytes`. I track this with `_aw_for_main_loop_alignment` now. I need this for https://github.com/openjdk/jdk/pull/18822, and decided to split it out into an independent change. ------------- Commit messages: - 8331764 Changes: https://git.openjdk.org/jdk/pull/19115/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19115&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331764 Stats: 67 lines in 2 files changed: 41 ins; 20 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19115/head:pull/19115 PR: https://git.openjdk.org/jdk/pull/19115 From epeter at openjdk.org Tue May 7 12:47:01 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 May 2024 12:47:01 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v6] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 15:20:32 GMT, Damon Fenacci wrote: >> # Issue >> When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. >> >> # Causes >> On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. >> The same is true for `StoreVector`s. >> When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 >> >> where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor offsets interfere. >> Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 >> >> but we don?t make sure that there are no masks or offsets. >> A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and offsets here too but in this case we can include these cases if the masks and offsets of the vector stores are equivalent. >> >> # Solution >> To avoid folding `Load`- and `StoreVector`s with masks and offsets we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns its own (to avoid ever being the same as a store node). In this way, the checks in `MemNode::can_see_stored_value` >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 >> >> and `StoreNode::Identity` >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 >> >> will fail if masks or offsets are used. >> For 2 stores of the same value we instead check for mask and offset equality. >> >> Regression tests for... > > Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: > > - JDK-8325520: add store/load masked vector tests > - JDK-8325520: add store/load tests with duplicate offsets Nice, looks much better, I think the VM code is now correct. A few suggestions for code style. src/hotspot/share/opto/memnode.cpp line 1169: > 1167: // LoadVector/StoreVector need additional checks > 1168: if (st->is_StoreVector()) { > 1169: // Ensure that types match To reduce noise, you could revert these comment changes, up to you. src/hotspot/share/opto/memnode.cpp line 3518: > 3516: mem->in(MemNode::ValueIn)->eqv_uncast(val) && > 3517: mem->Opcode() == Opcode()) { > 3518: // Not a vector Suggestion: Redundant comment, the next line literally says as much ;) src/hotspot/share/opto/memnode.cpp line 3546: > 3544: const StoreVectorScatterMaskedNode* svgm = mem->as_StoreVectorScatterMasked(); > 3545: if (offsets->eqv_uncast(svgm->in(StoreVectorScatterMaskedNode::Offsets)) && > 3546: mask->eqv_uncast(svgm->in(StoreVectorScatterMaskedNode::Mask))) { Suggestion: mask->eqv_uncast(svgm->in(StoreVectorScatterMaskedNode::Mask))) { src/hotspot/share/opto/memnode.cpp line 3551: > 3549: // Regular store (no offsets or mask) > 3550: } else { > 3551: result = mem; Suggestion: assert(Opcode() = Op_StoreVector, "just a plain vector store, no offset or mask"); result = mem; Turning comments into asserts is preferable I would say. src/hotspot/share/opto/memnode.cpp line 3554: > 3552: } > 3553: } > 3554: } I think the code is now correct. But I find the nested if-elseif-elseif-else ... structure a bit hard to read. And there is quite some code duplication (e.g. `result = mem` and all the `eqv_uncast` checks). You could either do something like this: if (!is_StoreVector() || as_StoreVector()->has_same_vect_type_and_offsets_and_mask(mem->as_StoreVector())) { result = mem; } Sketch: has_same_vect_type_and_offsets_and_mask: different vect_type -> return false ... Or maybe it would be better to define virtual functions to get the `mask` and `offsets` from a `StoreVector`? If it has none, just return `nullptr`. Sometimes people worry about virtual methods, but we already use them extensively for the node Value/Ideal anyway. Then, you can do: if (!is_StoreVector()) { result = mem; } else { const Node* offsets1 = as_StoreVector()->get_offsets(); const Node* offsets2 = mem->as_StoreVector()->get_offsets(); const Node* mask1 = as_StoreVector()->get_mask(); const Node* mask2 = mem->as_StoreVector()->get_mask(); if (offsets1->eqv_uncast(offsets2) && offsets1->eqv_uncast(offsets2)) { result = mem; } } I think that would be the cleanest and most readable way. What do you think? ------------- PR Review: https://git.openjdk.org/jdk/pull/18347#pullrequestreview-2043033219 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1592390146 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1592392528 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1592399071 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1592396514 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1592417634 From epeter at openjdk.org Tue May 7 12:47:02 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 May 2024 12:47:02 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v6] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 12:28:32 GMT, Emanuel Peter wrote: >> Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: >> >> - JDK-8325520: add store/load masked vector tests >> - JDK-8325520: add store/load tests with duplicate offsets > > src/hotspot/share/opto/memnode.cpp line 3546: > >> 3544: const StoreVectorScatterMaskedNode* svgm = mem->as_StoreVectorScatterMasked(); >> 3545: if (offsets->eqv_uncast(svgm->in(StoreVectorScatterMaskedNode::Offsets)) && >> 3546: mask->eqv_uncast(svgm->in(StoreVectorScatterMaskedNode::Mask))) { > > Suggestion: > > mask->eqv_uncast(svgm->in(StoreVectorScatterMaskedNode::Mask))) { Alignment was off ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1592399246 From epeter at openjdk.org Tue May 7 12:59:57 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 May 2024 12:59:57 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v4] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 15:20:23 GMT, Damon Fenacci wrote: >> Nice, ah you are right, there can be issues with mask-only cases as well! >> >> It would be great if you had tests that exactly exercise these "bad" examples, where it looks like we might optimize, but it would be wrong. >> >> I'll look at your `store_Opcode` changes now... > >> It would be great if you had tests that exactly exercise these "bad" examples, where it looks like we might optimize, but it would be wrong. > > Yep, good idea. I've added a few tests to check for those cases (load-store with duplicate offsets and store-load with masks). Thanks @eme64! @dafedafe I also scanned quickly over the regression tests. I see at least two aspects missing: - No mixed type test for load-store: Use MemorySegment `from/intoMmemorySegment`. Try something like store a int-vector, and load a float-vector. - Mismatched vector length: store a vector of length 4, and load one of length 8. I think all of these are currently correctly handled by your `vect_type` checks in the VM code, but it would be good to see that they are covered by regression tests, in case someone messes this up in the future. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18347#issuecomment-2098345482 From epeter at openjdk.org Tue May 7 13:03:55 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 May 2024 13:03:55 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v6] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 15:20:32 GMT, Damon Fenacci wrote: >> # Issue >> When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. >> >> # Causes >> On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. >> The same is true for `StoreVector`s. >> When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 >> >> where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor offsets interfere. >> Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 >> >> but we don?t make sure that there are no masks or offsets. >> A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and offsets here too but in this case we can include these cases if the masks and offsets of the vector stores are equivalent. >> >> # Solution >> To avoid folding `Load`- and `StoreVector`s with masks and offsets we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns its own (to avoid ever being the same as a store node). In this way, the checks in `MemNode::can_see_stored_value` >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 >> >> and `StoreNode::Identity` >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 >> >> will fail if masks or offsets are used. >> For 2 stores of the same value we instead check for mask and offset equality. >> >> Regression tests for... > > Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: > > - JDK-8325520: add store/load masked vector tests > - JDK-8325520: add store/load tests with duplicate offsets Ah, some more missing cases: - Do some store-store and store-load cases where you the first and second are different loads/stores, i.e. one with and one without mask/offsets. E.g. `StoreVectorMasked` and `StoreVectorScatter` in a store-store test. Doing the total cross-product is probably too much, but a few examples would be a good start. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18347#issuecomment-2098354283 From epeter at openjdk.org Tue May 7 13:24:02 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 May 2024 13:24:02 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v8] In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 12:52:15 GMT, Bhavana Kilambi wrote: >> Floating-point addition is non-associative, that is adding floating-point elements in arbitrary order may get different value. Specially, Vector API does not define the order of reduction intentionally, which allows platforms to generate more efficient codes [1]. So that needs a node to represent non strictly-ordered add-reduction for floating-point type in C2. >> >> To avoid introducing new nodes, this patch adds a bool field in `AddReductionVF/D` to distinguish whether they require strict order. It also removes `UnorderedReductionNode` and adds a virtual function `bool requires_strict_order()` in `ReductionNode`. Besides `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` have a fixed value. >> >> With this patch, Vector API would always generate non strictly-ordered `AddReductionVF/D' on SVE machines with vector length <= 16B as it is more beneficial to generate non-strictly ordered instructions on such machines compared to strictly ordered ones. >> >> [AArch64] >> On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. Auto-vectorization has already banned these nodes in JDK-8275275 [2]. >> >> This patch adds matching rules for non strictly-ordered `AddReductionVF/D`. >> >> No effects on other platforms. >> >> [Performance] >> FloatMaxVector.ADDLanes [3] measures the performance of add reduction for floating-point type. With this patch, it improves ~3x on my SVE machine (128-bit). >> >> ADDLanes >> >> Benchmark Before After Unit >> FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms >> >> >> Final code is as below: >> >> Before: >> ` fadda z17.s, p7/m, z17.s, z16.s >> ` >> After: >> >> faddp v17.4s, v21.4s, v21.4s >> faddp s18, v17.2s >> fadd s18, s18, s19 >> >> >> >> >> [Test] >> Full jtreg passed on AArch64 and x86. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2529 >> [2] https://bugs.openjdk.org/browse/JDK-8275275 >> [3] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L316 > > Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge master > - Adjust format for the backend rules changed in previous commit > - Address some more review comments > - Revert to previous indentation > - Add comments, revert to requires_strict_order and other minor changes > - Naming changes: replace strict/non-strict with more technical terms > - Addressed review comments for changes in backend rules and code style > - 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction > > Floating-point addition is non-associative, that is adding > floating-point elements in arbitrary order may get different value. > Specially, Vector API does not define the order of reduction > intentionally, which allows platforms to generate more efficient codes > [1]. So that needs a node to represent non strictly-ordered > add-reduction for floating-point type in C2. > > To avoid introducing new nodes, this patch adds a bool field in > `AddReductionVF/D` to distinguish whether they require strict order. It > also removes `UnorderedReductionNode` and adds a virtual function > `bool requires_strict_order()` in `ReductionNode`. Besides > `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` > have a fixed value. > > With this patch, Vector API would always generate non strictly-ordered > `AddReductionVF/D' on SVE machines with vector length <= 16B as it is > more beneficial to generate non-strictly ordered instructions on such > machines compared to strictly ordered ones. > > [AArch64] > On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. > Auto-vectorization has already banned these nodes in JDK-8275275 [2]. > > This patch adds matching rules for non strictly-ordered > `AddReductionVF/D`. > > No effects on other platforms. > > [Performance] > FloatMaxVector.ADDLanes [3] measures the performance of add reduction > for floating-point type. With this patch, it improves ~3x on my SVE > machine (128-bit). > > ADDLanes > Benchmark Before After Unit > FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms > > Final code is as below: > > ``` > Before: > fadda z17.s, p7/m, z17.s, z16.s > > After: > faddp v17.4s, v21.4s,... I just realized that there is no regression test. And I think it would be nice to have one. Also, we should add some sort of message to the `dump` if the `ReductionNode` has the `requires_strict_order` on or off. I think that could be done in `dump_spec`. You could do it similar to: #ifndef PRODUCT void VectorMaskCmpNode::dump_spec(outputStream *st) const { st->print(" %d #", _predicate); _type->dump_on(st); } #endif // PRODUCT This would actually allow you to create a IR test! You would check that the AddReductionVNode is annotated correctly. You need some VectorAPI tests, and some SuperWord auto-vectorization tests. How does that sound? That would ensure that nobody can easily destroy your RFE, at least not in the IR. Sorry for the delay, I'm really excited about this one, just had to get some more critical things done recently ;) src/hotspot/cpu/aarch64/aarch64_vector.ad line 2907: > 2905: format %{ "reduce_addF_sve $dst_src1, $dst_src1, $src2" %} > 2906: ins_encode %{ > 2907: assert(UseSVE > 0, "must be sve"); Is there no way we would now run into this assert? static bool use_neon_for_vector(int vector_length_in_bytes) { return vector_length_in_bytes <= 16; } Does `vector_length_in_bytes > 16` imply that we have `UseSVE > 0`? ------------- PR Review: https://git.openjdk.org/jdk/pull/18034#pullrequestreview-2043144243 PR Comment: https://git.openjdk.org/jdk/pull/18034#issuecomment-2098395131 PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1592455614 From epeter at openjdk.org Tue May 7 13:46:56 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 May 2024 13:46:56 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v13] In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 11:38:27 GMT, Hamlin Li wrote: >> HI, >> Can you have a look at this patch adding some tests for Math.round instrinsics? >> Thanks! >> >> ### FYI: >> During the development of RoundVF/RoundF, we faced the issues which were only spotted by running test exhaustively against 32/64 bits range of int/long. >> It's helpful to add these exhaustive tests in jdk for future possible usage, rather than build it everytime when needed. >> Of course, we need to put it in `manual` mode, so it's not run when `-automatic` jtreg option is specified which I guess is the mode CI used, please correct me if I'm assume incorrectly. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > fix issues; modify vm options to make sure test the expected behaviors. Thanks for the extra tests! Can you measure how much time each test now takes on your machine? I think we are getting there. Still a little worried about some random bugs in the whole number generation... But I'd prefer having these tests to not having them for sure ;) test/hotspot/jtreg/compiler/floatingpoint/TestRoundFloatAll.java line 31: > 29: * @library /test/lib / > 30: * @modules java.base/jdk.internal.math > 31: * @run main/othervm -XX:-TieredCompilation -XX:CompileThresholdScaling=0.3 -XX:+PrintIdeal -XX:CompileCommand=compileonly,compiler.floatingpoint.TestRoundFloatAll::test* -XX:-UseSuperWord compiler.floatingpoint.TestRoundFloatAll please break up the line for easier reading test/hotspot/jtreg/compiler/floatingpoint/TestRoundFloatAll.java line 75: > 73: return (int) a; > 74: } > 75: } At first, I was worried about the indentation, then realized the original code had the strange indentation. Would there be a way to put this method in a shared file, so that you do not need to paste it everywhere? test/hotspot/jtreg/compiler/vectorization/TestRoundVectorFloatAll.java line 34: > 32: * @run main/othervm -XX:+PrintIdeal -XX:-TieredCompilation -XX:CompileThresholdScaling=0.3 -XX:MaxVectorSize=8 -XX:+UseSuperWord -XX:CompileCommand=compileonly,compiler.vectorization.TestRoundVectorFloatAll::test* compiler.vectorization.TestRoundVectorFloatAll > 33: * @run main/othervm -XX:+PrintIdeal -XX:-TieredCompilation -XX:CompileThresholdScaling=0.3 -XX:MaxVectorSize=16 -XX:+UseSuperWord -XX:CompileCommand=compileonly,compiler.vectorization.TestRoundVectorFloatAll::test* compiler.vectorization.TestRoundVectorFloatAll > 34: * @run main/othervm -XX:+PrintIdeal -XX:-TieredCompilation -XX:CompileThresholdScaling=0.3 -XX:MaxVectorSize=32 -XX:+UseSuperWord -XX:CompileCommand=compileonly,compiler.vectorization.TestRoundVectorFloatAll::test* compiler.vectorization.TestRoundVectorFloatAll Please check which flags you actually need here.... test/hotspot/jtreg/compiler/vectorization/TestRoundVectorFloatAll.java line 43: > 41: public class TestRoundVectorFloatAll { > 42: private static final int ITERS = 11000; > 43: private static final int ARRLEN = 997; Could you randomize this value ever so slightly? That way, the boundaries of the array are at different places. I think also that the size should be a little larger, just to ensure that we get maximum vector lengths. test/hotspot/jtreg/compiler/vectorization/TestRoundVectorFloatRandom.java line 202: > 200: } > 201: > 202: // test cases for NaN, Inf, subnormal, and so on just for completeness: +0.0 and -0.0 ------------- PR Review: https://git.openjdk.org/jdk/pull/17753#pullrequestreview-2043182218 PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1592477207 PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1592487797 PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1592499343 PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1592508616 PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1592481581 From epeter at openjdk.org Tue May 7 13:46:57 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 May 2024 13:46:57 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v13] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 13:23:48 GMT, Emanuel Peter wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> fix issues; modify vm options to make sure test the expected behaviors. > > test/hotspot/jtreg/compiler/floatingpoint/TestRoundFloatAll.java line 31: > >> 29: * @library /test/lib / >> 30: * @modules java.base/jdk.internal.math >> 31: * @run main/othervm -XX:-TieredCompilation -XX:CompileThresholdScaling=0.3 -XX:+PrintIdeal -XX:CompileCommand=compileonly,compiler.floatingpoint.TestRoundFloatAll::test* -XX:-UseSuperWord compiler.floatingpoint.TestRoundFloatAll > > please break up the line for easier reading Why these flags: `-XX:-TieredCompilation -XX:CompileThresholdScaling=0.3 -XX:+PrintIdeal -XX:-UseSuperWord` ? I also suggest that you use `-Xbatch`, just to make sure we have compiled all relevant methods after the warmup. If things get too slow, then maybe you want to consider using explicit compile exclusion / forbidding inlining for the `test*` method, rather than the compileonly, which prevents everything else from compiling. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1592498081 From snazarki at openjdk.org Tue May 7 14:02:00 2024 From: snazarki at openjdk.org (Sergey Nazarkin) Date: Tue, 7 May 2024 14:02:00 GMT Subject: RFR: 8330806: test/hotspot/jtreg/compiler/c1/TestLargeMonitorOffset.java fails on ARM32 In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 14:21:09 GMT, Aleksei Voitylov wrote: > TestLargeMonitorOffset was introduced by 8310844 with a fix for the AArch64 platform. The same issue needs to be fixed for ARM32. With this change, we add the large slot_offset handling to the ARM32 version of IR_Assembler::osr_entry(). > > Testing: jtreg hotspot, jtreg jdk tier1-3. Marked as reviewed by snazarki (no project role). ------------- PR Review: https://git.openjdk.org/jdk/pull/18891#pullrequestreview-2043280405 From chagedorn at openjdk.org Tue May 7 14:44:54 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 7 May 2024 14:44:54 GMT Subject: RFR: 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode [v2] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 14:24:27 GMT, Christian Hagedorn wrote: >> This patch replaces the `Opaque4Node` of the `If` for Initialized Assertion Predicates with a new `OpaqueInitializedAsseritonPredicateNode`. This helps to simplify pattern matching for predicate code and to distinguish from the two other uses of `Opaque4` nodes: >> 1. Template Assertion Predicate: The goal is to get rid of its `Opaque4Node` as well by using a dedicated `TemplateAssertionPredicateNode` for the `IfNode`. >> 2. Non-null-checks with instrinsics and unsafe accesses: This will eventually be the only use left. Once we get there, we should rename the node accordingly to `OpaqueNonNullCheck` or something like that. >> >> I went through all the uses of `Opaque4` nodes and did the following: >> - Could the `Opaque4` node be part of an Initialized Assertion Predicate? >> - No: Added an assert that we are not dealing with an Initialized Assertion Predicate. >> - Yes: >> - Yes **and only** for Initialized Assertion Predicates? Added an assert that we are only expecting an `OpaqueInitializedAsseritonPredicateNode` if appropriate. >> - Yes but could also be something else: Added case for `OpaqueInitializedAsseritonPredicateNode` next to the `Opaque4` case. >> - Is this `Opaque4` node only used for Template Assertion Predicates? >> - Yes: Added assert with call to `assertion_predicate_has_loop_opaque_node()` to check that we find its `OpaqueLoop*Nodes`. >> - I've added test cases where I was not sure about whether an `Opaque4` node could be part of a Template, an Initialized Assertion Predicate or a non-null-check. This was a little tricky but I think it was still worth to prevent future bugs (even though most of these special cases are quite rare). >> >> This is another patch split off from the full fix for Assertion Predicates. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into JDK-8330386 > - Add more comments and asserts > - Add more tests > - 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode Thanks Vladimir for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18951#issuecomment-2098573489 From dfenacci at openjdk.org Tue May 7 14:55:32 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 7 May 2024 14:55:32 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v7] In-Reply-To: References: Message-ID: > # Issue > When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. > > # Causes > On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. > The same is true for `StoreVector`s. > When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 > > where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor offsets interfere. > Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > but we don?t make sure that there are no masks or offsets. > A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and offsets here too but in this case we can include these cases if the masks and offsets of the vector stores are equivalent. > > # Solution > To avoid folding `Load`- and `StoreVector`s with masks and offsets we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns its own (to avoid ever being the same as a store node). In this way, the checks in `MemNode::can_see_stored_value` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 > > and `StoreNode::Identity` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > will fail if masks or offsets are used. > For 2 stores of the same value we instead check for mask and offset equality. > > Regression tests for all versions of `Load/StoreVectorGather/Masked` have been add... Damon Fenacci has updated the pull request incrementally with four additional commits since the last revision: - Update src/hotspot/share/opto/memnode.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/memnode.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/memnode.cpp Co-authored-by: Emanuel Peter - JDK-8325520: remove leftover comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18347/files - new: https://git.openjdk.org/jdk/pull/18347/files/85bb4bef..72bf6ca3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=05-06 Stats: 5 lines in 1 file changed: 1 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18347.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18347/head:pull/18347 PR: https://git.openjdk.org/jdk/pull/18347 From dfenacci at openjdk.org Tue May 7 14:55:32 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 7 May 2024 14:55:32 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v6] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 12:21:30 GMT, Emanuel Peter wrote: >> Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: >> >> - JDK-8325520: add store/load masked vector tests >> - JDK-8325520: add store/load tests with duplicate offsets > > src/hotspot/share/opto/memnode.cpp line 1169: > >> 1167: // LoadVector/StoreVector need additional checks >> 1168: if (st->is_StoreVector()) { >> 1169: // Ensure that types match > > To reduce noise, you could revert these comment changes, up to you. Right, it was a leftover. Removed. > src/hotspot/share/opto/memnode.cpp line 3551: > >> 3549: // Regular store (no offsets or mask) >> 3550: } else { >> 3551: result = mem; > > Suggestion: > > assert(Opcode() = Op_StoreVector, "just a plain vector store, no offset or mask"); > result = mem; > > Turning comments into asserts is preferable I would say. Good idea! Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1592624106 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1592629045 From dfenacci at openjdk.org Tue May 7 15:31:56 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 7 May 2024 15:31:56 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v6] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 12:42:38 GMT, Emanuel Peter wrote: >> Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: >> >> - JDK-8325520: add store/load masked vector tests >> - JDK-8325520: add store/load tests with duplicate offsets > > src/hotspot/share/opto/memnode.cpp line 3554: > >> 3552: } >> 3553: } >> 3554: } > > I think the code is now correct. > But I find the nested if-elseif-elseif-else ... structure a bit hard to read. And there is quite some code duplication (e.g. `result = mem` and all the `eqv_uncast` checks). > > You could either do something like this: > > if (!is_StoreVector() || > as_StoreVector()->has_same_vect_type_and_offsets_and_mask(mem->as_StoreVector())) { > result = mem; > } > > > Sketch: > > has_same_vect_type_and_offsets_and_mask: > > different vect_type -> return false > ... > > > Or maybe it would be better to define virtual functions to get the `mask` and `offsets` from a `StoreVector`? If it has none, just return `nullptr`. Sometimes people worry about virtual methods, but we already use them extensively for the node Value/Ideal anyway. > > Then, you can do: > > if (!is_StoreVector()) { > result = mem; > } else { > const Node* offsets1 = as_StoreVector()->get_offsets(); > const Node* offsets2 = mem->as_StoreVector()->get_offsets(); > const Node* mask1 = as_StoreVector()->get_mask(); > const Node* mask2 = mem->as_StoreVector()->get_mask(); > if (offsets1->eqv_uncast(offsets2) && offsets1->eqv_uncast(offsets2)) { > result = mem; > } > } > > I think that would be the cleanest and most readable way. > > What do you think? I agree that it is quite convoluted probably also because I've put `if (!is_StoreVector())` (which is redundant) at the beginning to get the most common case out of the way but still... At first I thought that multiple inheritance would be a good solution (masks and offsets could be inherited by the corresponding nodes) but the "HotSpot Coding Style" clearly says to avoid it... So, I think in the end your second suggestion is the cleanest. Changing it... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1592686165 From bkilambi at openjdk.org Tue May 7 15:36:57 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 7 May 2024 15:36:57 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v8] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 13:21:35 GMT, Emanuel Peter wrote: > Sorry for the delay, I'm really excited about this one, just had to get some more critical things done recently ;) Thanks for the review. I will update my patch soon with your suggestions. Apologies for not making changes in the test* directory regarding the UnorderedReduction node which is now deleted but some tests seem to exist. > src/hotspot/cpu/aarch64/aarch64_vector.ad line 2907: > >> 2905: format %{ "reduce_addF_sve $dst_src1, $dst_src1, $src2" %} >> 2906: ins_encode %{ >> 2907: assert(UseSVE > 0, "must be sve"); > > Is there no way we would now run into this assert? > > static bool use_neon_for_vector(int vector_length_in_bytes) { > return vector_length_in_bytes <= 16; > } > > Does `vector_length_in_bytes > 16` imply that we have `UseSVE > 0`? Yes, if `vector_length_in_bytes > 16`, it does imply `UseSVE > 0` as we do not have machines which have vector length > 16 with only Neon (or UseSVE == 0). ------------- PR Comment: https://git.openjdk.org/jdk/pull/18034#issuecomment-2098737087 PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1592690587 From roland at openjdk.org Tue May 7 15:43:56 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 7 May 2024 15:43:56 GMT Subject: RFR: 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode [v2] In-Reply-To: References: Message-ID: <_8csQpQVHlNpwenIT4H7OFkMSOaU6Fz-ZmJ0Yi6ArLU=.0b84b78d-4637-49ab-b43f-4c457498b0ce@github.com> On Mon, 6 May 2024 14:24:27 GMT, Christian Hagedorn wrote: >> This patch replaces the `Opaque4Node` of the `If` for Initialized Assertion Predicates with a new `OpaqueInitializedAsseritonPredicateNode`. This helps to simplify pattern matching for predicate code and to distinguish from the two other uses of `Opaque4` nodes: >> 1. Template Assertion Predicate: The goal is to get rid of its `Opaque4Node` as well by using a dedicated `TemplateAssertionPredicateNode` for the `IfNode`. >> 2. Non-null-checks with instrinsics and unsafe accesses: This will eventually be the only use left. Once we get there, we should rename the node accordingly to `OpaqueNonNullCheck` or something like that. >> >> I went through all the uses of `Opaque4` nodes and did the following: >> - Could the `Opaque4` node be part of an Initialized Assertion Predicate? >> - No: Added an assert that we are not dealing with an Initialized Assertion Predicate. >> - Yes: >> - Yes **and only** for Initialized Assertion Predicates? Added an assert that we are only expecting an `OpaqueInitializedAsseritonPredicateNode` if appropriate. >> - Yes but could also be something else: Added case for `OpaqueInitializedAsseritonPredicateNode` next to the `Opaque4` case. >> - Is this `Opaque4` node only used for Template Assertion Predicates? >> - Yes: Added assert with call to `assertion_predicate_has_loop_opaque_node()` to check that we find its `OpaqueLoop*Nodes`. >> - I've added test cases where I was not sure about whether an `Opaque4` node could be part of a Template, an Initialized Assertion Predicate or a non-null-check. This was a little tricky but I think it was still worth to prevent future bugs (even though most of these special cases are quite rare). >> >> This is another patch split off from the full fix for Assertion Predicates. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into JDK-8330386 > - Add more comments and asserts > - Add more tests > - 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode src/hotspot/share/opto/opaquenode.cpp line 110: > 108: } > 109: > 110: Node* OpaqueInitializedAssertionPredicateNode::Identity(PhaseGVN* phase) { Opaque4 is removed by macro expansion, right? But the new one is removed after loop opts.. So there's a change in behavior. What's the rationale for making that change? src/hotspot/share/opto/opaquenode.hpp line 138: > 136: // to true. Therefore, we get rid of them in product builds as they are useless. In debug builds we keep them as > 137: // additional verification code (i.e. removing this node and use the BoolNode input instead). > 138: class OpaqueInitializedAssertionPredicateNode : public Node { Shouldn't the new OpaqueInitializedAssertionPredicateNode be a subclass of Opaque4 or shouldn't both be a subclass of a common super type? Don't they share at least some logic or behavior? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18951#discussion_r1592701811 PR Review Comment: https://git.openjdk.org/jdk/pull/18951#discussion_r1592702908 From duke at openjdk.org Tue May 7 15:52:57 2024 From: duke at openjdk.org (Daniel Skantz) Date: Tue, 7 May 2024 15:52:57 GMT Subject: Integrated: 8330016: Stress seed should be initialized for runtime stub compilation In-Reply-To: <-Pj7vK6Wpgv3FnP2KZXB2so2wWraG3LWEKds9XDI3pY=.d6595586-aa83-47bc-84ed-ff8c4a1a5550@github.com> References: <-Pj7vK6Wpgv3FnP2KZXB2so2wWraG3LWEKds9XDI3pY=.d6595586-aa83-47bc-84ed-ff8c4a1a5550@github.com> Message-ID: On Mon, 6 May 2024 06:31:47 GMT, Daniel Skantz wrote: > We can initialize the stress seed for runtime stub compilation as we already do for method compilation. This found the bug described in JDK-8329258. It would apply if StressGCM or StressLCM vm flags are set. > > Testing: T1-5 default options. T1-5 with -XX:+StressLCM and -XX:+StressGCM. Manually tested that the stress seed is set and printed to compilation log if either stress option is set. This pull request has now been integrated. Changeset: 95d2f807 Author: Daniel Skantz Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/95d2f8072e91e8df80e49e341f4fdb4464a2616e Stats: 31 lines in 2 files changed: 20 ins; 10 del; 1 mod 8330016: Stress seed should be initialized for runtime stub compilation Reviewed-by: rcastanedalo, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/19095 From szaldana at openjdk.org Tue May 7 16:02:57 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Tue, 7 May 2024 16:02:57 GMT Subject: Integrated: 8319957: PhaseOutput::code_size is unused and should be removed In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 17:31:45 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR removes the unused ```PhaseOutput::code_size / method_size```. > > These were moved over from ```src/hotspot/share/opto/compile.hpp``` in the refactor from [8240363](https://bugs.openjdk.org/browse/JDK-8240363). Here's the git link for reference https://github.com/openjdk/jdk/commit/21cd75cb98f658639df14632680e9c5e58f11faa. > > I also checked whether there were any usages prior to the refactor and couldn?t find anything so I think it?s safe to remove it. > > Thanks, > Sonia This pull request has now been integrated. Changeset: 524aaad9 Author: Sonia Zaldana Calles Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/524aaad98317b1a50453e5a9a44922f481fb3b1e Stats: 3 lines in 2 files changed: 0 ins; 3 del; 0 mod 8319957: PhaseOutput::code_size is unused and should be removed Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/18981 From aboldtch at openjdk.org Tue May 7 16:18:00 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 7 May 2024 16:18:00 GMT Subject: RFR: 8331863: DUIterator_Fast used before it is constructed Message-ID: <5FEXRCspKpxNj3FJW1_2fqvdzuC40gTT8-SAG_pEflU=.69f4b39b-b7ed-4e45-87ac-2245ec75c789@github.com> `SimpleDUIterator` constructs two `DUIterator_Fast` but passes a reference to the second when constructing the first. In debug values are read from this not yet constructed object. Found when building a debug build with UBSAN /src/hotspot/share/opto/node.cpp:124:8: runtime error: load of value 200, which is not a valid value for type 'bool' #0 0x14619f4e6476 in DUIterator_Common::reset(DUIterator_Common const&) /src/hotspot/share/opto/node.cpp:124 #1 0x1461a32556a5 in DUIterator_Fast::operator=(DUIterator_Fast const&) /src/hotspot/share/opto/node.hpp:1486 #2 0x1461a32556a5 in Node::fast_outs(DUIterator_Fast&) const /src/hotspot/share/opto/node.hpp:1491 #3 0x1461a32556a5 in SimpleDUIterator::SimpleDUIterator(Node*) /src/hotspot/share/opto/node.hpp:1575 #4 0x1461a32556a5 in G1BarrierSetC2::has_cas_in_use_chain(Node*) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:855 #5 0x1461a3256cf1 in G1BarrierSetC2::verify_pre_load(Node*, Unique_Node_List&) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:881 #6 0x1461a325eec3 in G1BarrierSetC2::verify_gc_barriers(Compile*, BarrierSetC2::CompilePhase) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:1019 #7 0x1461a325eec3 in G1BarrierSetC2::verify_gc_barriers(Compile*, BarrierSetC2::CompilePhase) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:963 #8 0x1461a23160ed in Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*) /src/hotspot/share/opto/compile.cpp:875 #9 0x1461a1845fd0 in C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) /src/hotspot/share/opto/c2compiler.cpp:142 #10 0x1461a235ac39 in CompileBroker::invoke_compiler_on_method(CompileTask*) /src/hotspot/share/compiler/compileBroker.cpp:2305 #11 0x1461a235ee4e in CompileBroker::compiler_thread_loop() /src/hotspot/share/compiler/compileBroker.cpp:1963 #12 0x1461a4076f8d in JavaThread::thread_main_inner() /src/hotspot/share/runtime/javaThread.cpp:760 #13 0x1461a409da23 in JavaThread::run() /src/hotspot/share/runtime/javaThread.cpp:745 #14 0x1461a7b6d2bc in Thread::call_run() /src/hotspot/share/runtime/thread.cpp:221 #15 0x1461a62a8105 in thread_native_entry /src/hotspot/os/linux/os_linux.cpp:846 #16 0x1461c29801d9 in start_thread (/lib64/libpthread.so.0+0x81d9) #17 0x1461c18cae72 in __clone (/lib64/libc.so.6+0x39e72) ------------- Commit messages: - 8331863: DUIterator_Fast used before it is constructed Changes: https://git.openjdk.org/jdk/pull/19125/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19125&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331863 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19125.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19125/head:pull/19125 PR: https://git.openjdk.org/jdk/pull/19125 From kvn at openjdk.org Tue May 7 16:21:01 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 7 May 2024 16:21:01 GMT Subject: RFR: 8331862: Remove split relocation info implementation Message-ID: [Split relocation info](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/relocInfo.hpp#L225) was used only for SPARC. Non of current OpenJDK platforms use it. Tested tier1. ------------- Commit messages: - 8331862: Remove split relocation info implementation Changes: https://git.openjdk.org/jdk/pull/19126/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19126&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331862 Stats: 114 lines in 11 files changed: 2 ins; 58 del; 54 mod Patch: https://git.openjdk.org/jdk/pull/19126.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19126/head:pull/19126 PR: https://git.openjdk.org/jdk/pull/19126 From dfenacci at openjdk.org Tue May 7 16:30:23 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 7 May 2024 16:30:23 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v8] In-Reply-To: References: Message-ID: <_x_fle5Uwlya8vU73BUtILG3LCIrWZ-_UapBTvmlv6Y=.c6580565-4682-4ca7-a902-e32df5161f68@github.com> > # Issue > When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. > > # Causes > On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. > The same is true for `StoreVector`s. > When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 > > where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor offsets interfere. > Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > but we don?t make sure that there are no masks or offsets. > A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and offsets here too but in this case we can include these cases if the masks and offsets of the vector stores are equivalent. > > # Solution > To avoid folding `Load`- and `StoreVector`s with masks and offsets we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns its own (to avoid ever being the same as a store node). In this way, the checks in `MemNode::can_see_stored_value` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 > > and `StoreNode::Identity` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > will fail if masks or offsets are used. > For 2 stores of the same value we instead check for mask and offset equality. > > Regression tests for all versions of `Load/StoreVectorGather/Masked` have been add... Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8325520: fix assert condition ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18347/files - new: https://git.openjdk.org/jdk/pull/18347/files/72bf6ca3..a2cb6a58 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18347.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18347/head:pull/18347 PR: https://git.openjdk.org/jdk/pull/18347 From kvn at openjdk.org Tue May 7 16:56:06 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 7 May 2024 16:56:06 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v18] In-Reply-To: References: Message-ID: <-Q8XJ3BT26WE6vPUNR7-_Wi7iw7QKTi9O5HsvdeGh4M=.e35dc82b-326f-4207-a3f3-bacfb20032f4@github.com> On Thu, 2 May 2024 14:54:17 GMT, Roland Westrelin wrote: >> This change implements C2 optimizations for calls to >> ScopedValue.get(). Indeed, in: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> `v2` can be replaced by `v1` and the second call to `get()` can be >> optimized out. That's true whatever is between the 2 calls unless a >> new mapping for `scopedValue` is created in between (when that happens >> no optimizations is performed for the method being compiled). Hoisting >> a `get()` call out of loop for a loop invariant `scopedValue` should >> also be legal in most cases. >> >> `ScopedValue.get()` is implemented in java code as a 2 step process. A >> cache is attached to the current thread object. If the `ScopedValue` >> object is in the cache then the result from `get()` is read from >> there. Otherwise a slow call is performed that also inserts the >> mapping in the cache. The cache itself is lazily allocated. One >> `ScopedValue` can be hashed to 2 different indexes in the cache. On a >> cache probe, both indexes are checked. As a consequence, the process >> of probing the cache is a multi step process (check if the cache is >> present, check first index, check second index if first index >> failed). If the cache is populated early on, then when the method that >> calls `ScopedValue.get()` is compiled, profile reports the slow path >> as never taken and only the read from the cache is compiled. >> >> To perform the optimizations, I added 3 new node types to C2: >> >> - the pair >> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for >> the cache probe >> >> - a cfg node ScopedValueGetResultNode to help locate the result of the >> `get()` call in the IR graph. >> >> In pseudo code, once the nodes are inserted, the code of a `get()` is: >> >> >> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) >> if (hits_in_the_cache) { >> res = ScopedValueGetLoadFromCache(hits_in_the_cache); >> } else { >> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex >> } >> res = ScopedValueGetResult(res) >> >> >> In the snippet: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> Replacing `v2` by `v1` is then done by starting from the >> `ScopedValueGetResult` node for the second `get()` and looking for a >> dominating `ScopedValueGetResult` for the same `ScopedValue` >> object. When one is found, it is used as a replacement. Eliminating >> the second `get()` call is achieved by making >> `ScopedValueGetHitsInCache` always successful if there's a dominating >> `Scoped... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > whitespaces I want to see performance numbers on x64 and aarch64 before starting looking on it. It would be nice to have data for all micros `test/micro/org/openjdk/bench/java/lang/ScopedValues*.java` Put results into JBS and post short summary here. You can compare by disable/enable new intrinsics. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2098897865 From duke at openjdk.org Tue May 7 16:58:09 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Tue, 7 May 2024 16:58:09 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v15] In-Reply-To: References: Message-ID: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: parameter and local renames, update comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/d93e9893..2a63a159 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=13-14 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From duke at openjdk.org Tue May 7 16:58:10 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Tue, 7 May 2024 16:58:10 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v14] In-Reply-To: References: <727FyZHyBbtRilYRtbP2E4dbZYqj9a-QgXAuicQ2iZQ=.01035706-6591-4df5-bf7d-d7a2f6209015@github.com> Message-ID: On Tue, 7 May 2024 05:51:22 GMT, Jatin Bhateja wrote: >> Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: >> >> revert unneeded legacy flag change for kmovwl(K,K) and kmovql(K,K) > > src/hotspot/cpu/x86/assembler_x86.cpp line 11754: > >> 11752: >> 11753: // This is a 4 byte encoding >> 11754: void Assembler::evex_prefix(bool vex_r, bool vex_b, bool vex_x, bool evex_r, bool evex_b, bool evex_v, > > Suggestion: > > void Assembler::evex_prefix(bool vex_r, bool vex_b, bool vex_x, bool evex_r, bool eevex_b, bool evex_v, Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 11766: > >> 11764: // P0: byte 2, initialized to RXBR`00mm >> 11765: // instead of not'd >> 11766: int byte2 = (vex_r ? VEX_R : 0) | (vex_x ? VEX_X : 0) | (vex_b ? VEX_B : 0) | (evex_r ? EVEX_Rb : 0); > > Comment at [L#11765 > ](https://github.com/openjdk/jdk/pull/18476/files#diff-e3576e9c22db89236cdb906f032ff00748ff6d1c21b05277d991d80af75daf3aL11686) > `// P0: byte 2, initialized to RXBR'00mm => // P0: byte 2, initialized to RXBR'0mmm` Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 11768: > >> 11766: int byte2 = (vex_r ? VEX_R : 0) | (vex_x ? VEX_X : 0) | (vex_b ? VEX_B : 0) | (evex_r ? EVEX_Rb : 0); >> 11767: byte2 = (~byte2) & 0xF0; >> 11768: byte2 |= evex_b ? EEVEX_B : 0; > > Suggestion: > > byte2 |= eevex_b ? EEVEX_B : 0; > > > This corresponds to B4 bit which is specific to EEVEX encoding. Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 11846: > >> 11844: } >> 11845: bool eevex_x = adr.index_needs_rex2(); >> 11846: bool evex_b = adr.base_needs_rex2(); > > Suggestion: > > bool eevex_b = adr.base_needs_rex2(); Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 11848: > >> 11846: bool evex_b = adr.base_needs_rex2(); >> 11847: attributes->set_is_evex_instruction(); >> 11848: evex_prefix(vex_r, vex_b, vex_x, evex_r, evex_b, evex_v, eevex_x, nds_enc, pre, opc); > > Suggestion: > > evex_prefix(vex_r, vex_b, vex_x, evex_r, eevex_b, evex_v, eevex_x, nds_enc, pre, opc); Thanks, done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1592799351 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1592800035 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1592799078 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1592799595 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1592799825 From epeter at openjdk.org Tue May 7 17:08:52 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 May 2024 17:08:52 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop In-Reply-To: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> Message-ID: <2lreoMy7UKtgM_m8RCU68rp3FFkoU8zj3ckuTKzXqf0=.dc02a0d4-2671-4c70-a470-a64f28e38f2d@github.com> On Fri, 3 May 2024 12:33:43 GMT, Roland Westrelin wrote: > In the test case: > > > long i; > for (; i > 0; i--) { > res += 42 / ((int) i); > > > The long counted loop phi has type `[1..100]`. As a consequence, the > `ConvL2I` also has type `[1..100]`. The `DivI` node that follows can't > fault: it is not guarded by a zero check and has no control set. > > The `ConvL2I` is split through phi and so is the `DiVI` node: > `PhaseIdealLoop::cannot_split_division()` returns true because the > value coming from the backedge into the `DivI` (when it is about to be > split thru phi) is the result of the `ConvL2I` which has type > `[1..100`] so is not zero as far as the compiler can tell. > > On the last iteration of the loop, i is 1. Because the DivI was split > thru Phi, it computes the value for the following iteration, so for i > = 0. This causes a crash when the compiled code runs. > > The same problem can't happen with an int counted loop because logic > in `PhaseIdealLoop::split_thru_phi()` prevents a `ConvI2L` from being > split thru phi. I propose to fix this the same way: in the test case, > it's not true that once the `ConvL2I` is split thru phi it keeps type > `[1..100]`. The fix is fairly conservative because it's base on the > existing logic for `ConvI2L`: we would want to not split a `ConvL2I` > only a counted loopd but. I suppose the same is true for the `ConvI2L` > and I thought it would be best to revisit both together. Looks reasonable. ------------ I guess the issue is that ConvL2I and ConvI2L are also type nodes, which can restrict their type, just like CastII nodes. And that restricting of the type is only true under a certain if-branch. But if the ConvI2L were not a type-node, then it would not restrict type, and you could simply push it through phis. Right? Why do we have type restriction mixed into ConvI2L? Could that not be separated out into a CastII / CastLL? Maybe we could generally separate ConvI2L, type restriction, and pinning? CastII also does multiple things, and it has hurt us many times in the past. Would this sort of maximal separation and specialization not be more "see of nodes" style? Anyway, this would be interesting to look into for a future RFE. test/hotspot/jtreg/compiler/splitif/TestLongCountedLoopConvL2I.java line 31: > 29: * -XX:+StressGCM -XX:StressSeed=92643864 TestLongCountedLoopConvL2I > 30: * @run main/othervm -XX:-BackgroundCompilation -XX:-TieredCompilation -XX:-UseOnStackReplacement > 31: * -XX:+StressGCM TestLongCountedLoopConvL2I Would it make sense to have a run that allows OSR? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19086#pullrequestreview-2043711442 PR Review Comment: https://git.openjdk.org/jdk/pull/19086#discussion_r1592792340 From galder at openjdk.org Tue May 7 17:14:23 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 7 May 2024 17:14:23 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v14] In-Reply-To: References: Message-ID: > Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. > > The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: > > > $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op > ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op > ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op > ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op > ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op > ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op > ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op > ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op > ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op > ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op > ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op > ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op > ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op > ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op > ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op > > > It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. > > I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. > > > $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > ... > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 > > > One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? > > Thanks @rwestrel for his help shaping this up :) Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: Assert type is not interface ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17667/files - new: https://git.openjdk.org/jdk/pull/17667/files/9376e9ec..306db745 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17667&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17667&range=12-13 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17667.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17667/head:pull/17667 PR: https://git.openjdk.org/jdk/pull/17667 From galder at openjdk.org Tue May 7 17:14:23 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 7 May 2024 17:14:23 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v13] In-Reply-To: <_x-OSownzQQZ8fmlsbvQ42MLf9BGZskECTNncOE0s4E=.8381a076-0cc4-4339-924f-fa22ca780573@github.com> References: <9Eoh8hOSSVvAtf9iVQ6hflQyceUtt4dpZdqm61zg5XI=.358a4d79-70d9-4b54-85d5-37c6817f0fae@github.com> <_x-OSownzQQZ8fmlsbvQ42MLf9BGZskECTNncOE0s4E=.8381a076-0cc4-4339-924f-fa22ca780573@github.com> Message-ID: On Thu, 2 May 2024 08:24:34 GMT, Dean Long wrote: >> Then, I think we should add an assert that `!type->as_instance_klass()->is_interface()` and also that it's not and array of interfaces (using `base_element_klass()`) > > An array of interfaces can be exact: > > new Interface[20].getClasss(); > > and it seems like it would be safe to allow this, so I think we only need one assert for `!type->as_instance_klass()->is_interface()` if we don't trust the result of exact_type(). @dean-long @rwestrel I've added the assert. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1592820103 From chagedorn at openjdk.org Tue May 7 17:28:57 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 7 May 2024 17:28:57 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop In-Reply-To: References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> Message-ID: On Mon, 6 May 2024 11:50:40 GMT, Roland Westrelin wrote: > Are we sure divisions are the only cause of bugs? Not 100% sure. But the only cases I've observed so far are with division/mod where they float above and end up being executed too early (the result is never actually observed, though). > that once pushed thru phi, the type of the ConvL2I is simply not correct and that's the root cause. Yes, that's my understanding, too. But since the `AddL` input into the loop iv phi contains zero, it raised the question if we could actually detect that and do our decision based on whether the input contains zero instead of simply disabling pushing `ConvL2I` (and `ConvI2L`) nodes through phis entirely. It also seems that it's only a problem with loop iv phis because we improve the iv type in such a way that some of the possible values of the backedge are excluded. So, maybe a first step could be to allow splitting the `Conv*` nodes through non-loop-iv phi nodes. However, there might also be other non-loop-iv phi problems I'm currently not aware of. Nevertheless, it might be worth to investigate further in a separate RFE. > I wonder if we could get other failures because of this: maybe a node becoming top because of the incorrect type or an out of bound array access. Could very well be. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19086#discussion_r1592835265 From mli at openjdk.org Tue May 7 17:32:19 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 7 May 2024 17:32:19 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v14] In-Reply-To: References: Message-ID: > HI, > Can you have a look at this patch adding some tests for Math.round instrinsics? > Thanks! > > ### FYI: > During the development of RoundVF/RoundF, we faced the issues which were only spotted by running test exhaustively against 32/64 bits range of int/long. > It's helpful to add these exhaustive tests in jdk for future possible usage, rather than build it everytime when needed. > Of course, we need to put it in `manual` mode, so it's not run when `-automatic` jtreg option is specified which I guess is the mode CI used, please correct me if I'm assume incorrectly. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: misc fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17753/files - new: https://git.openjdk.org/jdk/pull/17753/files/b5207436..7c2ef4fb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17753&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17753&range=12-13 Stats: 251 lines in 5 files changed: 107 ins; 131 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/17753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17753/head:pull/17753 PR: https://git.openjdk.org/jdk/pull/17753 From mli at openjdk.org Tue May 7 17:32:20 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 7 May 2024 17:32:20 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v13] In-Reply-To: References: Message-ID: <41rVYJ90K1TmX9w8v2eZxPcaxH0YL8D3wrzQiEd7mnU=.a1458bea-4570-40ae-b052-523c413d26bd@github.com> On Tue, 7 May 2024 13:36:55 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/floatingpoint/TestRoundFloatAll.java line 31: >> >>> 29: * @library /test/lib / >>> 30: * @modules java.base/jdk.internal.math >>> 31: * @run main/othervm -XX:-TieredCompilation -XX:CompileThresholdScaling=0.3 -XX:+PrintIdeal -XX:CompileCommand=compileonly,compiler.floatingpoint.TestRoundFloatAll::test* -XX:-UseSuperWord compiler.floatingpoint.TestRoundFloatAll >> >> please break up the line for easier reading > > Why these flags: > `-XX:-TieredCompilation -XX:CompileThresholdScaling=0.3 -XX:+PrintIdeal -XX:-UseSuperWord` ? > > I also suggest that you use `-Xbatch`, just to make sure we have compiled all relevant methods after the warmup. If things get too slow, then maybe you want to consider using explicit compile exclusion / forbidding inlining for the `test*` method, rather than the compileonly, which prevents everything else from compiling. Thanks for suggestion, added `-Xbatch`. removed `-XX:+PrintIdeal`. keep `-XX:-UseSuperWord`, as we are testing scalar version intrinsic in this test. `-XX:-TieredCompilation -XX:CompileThresholdScaling=0.3` are just from previous tests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1592837993 From mli at openjdk.org Tue May 7 17:32:21 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 7 May 2024 17:32:21 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v13] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 13:30:12 GMT, Emanuel Peter wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> fix issues; modify vm options to make sure test the expected behaviors. > > test/hotspot/jtreg/compiler/floatingpoint/TestRoundFloatAll.java line 75: > >> 73: return (int) a; >> 74: } >> 75: } > > At first, I was worried about the indentation, then realized the original code had the strange indentation. > Would there be a way to put this method in a shared file, so that you do not need to paste it everywhere? moved to a shared lib file. > test/hotspot/jtreg/compiler/vectorization/TestRoundVectorFloatAll.java line 34: > >> 32: * @run main/othervm -XX:+PrintIdeal -XX:-TieredCompilation -XX:CompileThresholdScaling=0.3 -XX:MaxVectorSize=8 -XX:+UseSuperWord -XX:CompileCommand=compileonly,compiler.vectorization.TestRoundVectorFloatAll::test* compiler.vectorization.TestRoundVectorFloatAll >> 33: * @run main/othervm -XX:+PrintIdeal -XX:-TieredCompilation -XX:CompileThresholdScaling=0.3 -XX:MaxVectorSize=16 -XX:+UseSuperWord -XX:CompileCommand=compileonly,compiler.vectorization.TestRoundVectorFloatAll::test* compiler.vectorization.TestRoundVectorFloatAll >> 34: * @run main/othervm -XX:+PrintIdeal -XX:-TieredCompilation -XX:CompileThresholdScaling=0.3 -XX:MaxVectorSize=32 -XX:+UseSuperWord -XX:CompileCommand=compileonly,compiler.vectorization.TestRoundVectorFloatAll::test* compiler.vectorization.TestRoundVectorFloatAll > > Please check which flags you actually need here.... removed `-XX:+PrintIdeal` others seems useful to me. > test/hotspot/jtreg/compiler/vectorization/TestRoundVectorFloatAll.java line 43: > >> 41: public class TestRoundVectorFloatAll { >> 42: private static final int ITERS = 11000; >> 43: private static final int ARRLEN = 997; > > Could you randomize this value ever so slightly? That way, the boundaries of the array are at different places. I think also that the size should be a little larger, just to ensure that we get maximum vector lengths. Make sense, done. > test/hotspot/jtreg/compiler/vectorization/TestRoundVectorFloatRandom.java line 202: > >> 200: } >> 201: >> 202: // test cases for NaN, Inf, subnormal, and so on > > just for completeness: +0.0 and -0.0 added ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1592838750 PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1592838951 PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1592839461 PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1592838230 From chagedorn at openjdk.org Tue May 7 17:32:59 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 7 May 2024 17:32:59 GMT Subject: RFR: 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode [v2] In-Reply-To: <_8csQpQVHlNpwenIT4H7OFkMSOaU6Fz-ZmJ0Yi6ArLU=.0b84b78d-4637-49ab-b43f-4c457498b0ce@github.com> References: <_8csQpQVHlNpwenIT4H7OFkMSOaU6Fz-ZmJ0Yi6ArLU=.0b84b78d-4637-49ab-b43f-4c457498b0ce@github.com> Message-ID: <7b3qt72dd5rV6nirPQILkqTMleDRMRYuXlKpqVVVpyo=.c2ed3889-cb43-4576-9d63-de133152b7fb@github.com> On Tue, 7 May 2024 15:40:40 GMT, Roland Westrelin wrote: >> Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8330386 >> - Add more comments and asserts >> - Add more tests >> - 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode > > src/hotspot/share/opto/opaquenode.cpp line 110: > >> 108: } >> 109: >> 110: Node* OpaqueInitializedAssertionPredicateNode::Identity(PhaseGVN* phase) { > > Opaque4 is removed by macro expansion, right? But the new one is removed after loop opts.. So there's a change in behavior. What's the rationale for making that change? That's correct. I've originally had these nodes as macro nodes as well. But concepttionally, we want to get these nodes to be removed and the Initialized Assertion Predicates folded once we know that we no longer split loops (i.e. in post loop IGVN). I think it's easier to register them for this post loop IGVN run since we don't really expand the nodes to anything - they are just removed during expansion. I'm not entirely sure though what the original reason was to go with a macro expansion removal instead of a post loop IGVN removal for `Opaque4` nodes. Do you remember? > src/hotspot/share/opto/opaquenode.hpp line 138: > >> 136: // to true. Therefore, we get rid of them in product builds as they are useless. In debug builds we keep them as >> 137: // additional verification code (i.e. removing this node and use the BoolNode input instead). >> 138: class OpaqueInitializedAssertionPredicateNode : public Node { > > Shouldn't the new OpaqueInitializedAssertionPredicateNode be a subclass of Opaque4 or shouldn't both be a subclass of a common super type? Don't they share at least some logic or behavior? I first thought about reusing this class in some way. But the second input is actually not needed. We could move forward and just remove the second input for `Opaque4` nodes (it's always a true constant). But I still wanted to have an easy way to have a distinguishable node from the other uses of the `Opaque4` nodes in non-null checks. Furthermore, I think sub classing the `Opaque4` class can be problematic when doing `is_Opaque4()` since we sometimes expect an `Opaque4` only and sometimes an `OpaqueInitializedAssertionPredicate` only and sometimes both are fine. I think it's cleaner to have two separate classes instead of sub classing each other. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18951#discussion_r1592838684 PR Review Comment: https://git.openjdk.org/jdk/pull/18951#discussion_r1592840333 From kxu at openjdk.org Tue May 7 17:33:29 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 7 May 2024 17:33:29 GMT Subject: RFR: 8327381: Refactor type-improving transformations in BoolNode::Ideal to BoolNode::Value [v9] In-Reply-To: References: Message-ID: > This PR resolves [JDK-8327381](https://bugs.openjdk.org/browse/JDK-8327381) > > Currently the transformations for expressions with patterns `((x & m) u<= m)` or `((m & x) u<= m)` to `true` is in `BoolNode::Ideal` function with a new constant node of value `1` created. However, this is technically a type-improving (reduction in range) transformation that's better suited in `BoolNode::Value` function. > > New unit test `test/hotspot/jtreg/compiler/c2/TestBoolNodeGvn.java` asserting on IR nodes and correctness of this transformation is added and passing. Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: - Merge branch 'master' into boolnode-refactor - refactor BoolNode::Value() and extract code to ::Value_cmpu_and_mask - update comments - fix indentation again - apply test only on x64, aarch64 and riscv64 - also renames the class name in @run - update test @run annotation - improve formatting, correct annotation and rename test class - Merge branch 'master' into boolnode-refactor - update the package name for tests - ... and 6 more: https://git.openjdk.org/jdk/compare/91beff36...278c436a ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18198/files - new: https://git.openjdk.org/jdk/pull/18198/files/53cf5b3b..278c436a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18198&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18198&range=07-08 Stats: 122406 lines in 3144 files changed: 56561 ins; 49745 del; 16100 mod Patch: https://git.openjdk.org/jdk/pull/18198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18198/head:pull/18198 PR: https://git.openjdk.org/jdk/pull/18198 From mli at openjdk.org Tue May 7 17:36:57 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 7 May 2024 17:36:57 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v13] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 13:44:06 GMT, Emanuel Peter wrote: > Thanks for the extra tests! > Thanks for reviewing. > Can you measure how much time each test now takes on your machine? > Only TestRoundVectorFloatAll.java took longer, but still in one minute, others run rather quicker than it. > I think we are getting there. Still a little worried about some random bugs in the whole number generation... But I'd prefer having these tests to not having them for sure ;) Agree! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17753#issuecomment-2098965761 From kvn at openjdk.org Tue May 7 17:47:54 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 7 May 2024 17:47:54 GMT Subject: RFR: 8331863: DUIterator_Fast used before it is constructed In-Reply-To: <5FEXRCspKpxNj3FJW1_2fqvdzuC40gTT8-SAG_pEflU=.69f4b39b-b7ed-4e45-87ac-2245ec75c789@github.com> References: <5FEXRCspKpxNj3FJW1_2fqvdzuC40gTT8-SAG_pEflU=.69f4b39b-b7ed-4e45-87ac-2245ec75c789@github.com> Message-ID: <0L378sPPSazRYrpx_kfV6172mLp4EMfCWxw65zYoIj0=.36c662de-9bc4-4e8a-9e5d-5f7ae76c7f0b@github.com> On Tue, 7 May 2024 16:13:38 GMT, Axel Boldt-Christmas wrote: > `SimpleDUIterator` constructs two `DUIterator_Fast` but passes a reference to the second when constructing the first. In debug values are read from this not yet constructed object. > > Found when building a debug build with UBSAN > > /src/hotspot/share/opto/node.cpp:124:8: runtime error: load of value 200, which is not a valid value for type 'bool' > #0 0x14619f4e6476 in DUIterator_Common::reset(DUIterator_Common const&) /src/hotspot/share/opto/node.cpp:124 > #1 0x1461a32556a5 in DUIterator_Fast::operator=(DUIterator_Fast const&) /src/hotspot/share/opto/node.hpp:1486 > #2 0x1461a32556a5 in Node::fast_outs(DUIterator_Fast&) const /src/hotspot/share/opto/node.hpp:1491 > #3 0x1461a32556a5 in SimpleDUIterator::SimpleDUIterator(Node*) /src/hotspot/share/opto/node.hpp:1575 > #4 0x1461a32556a5 in G1BarrierSetC2::has_cas_in_use_chain(Node*) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:855 > #5 0x1461a3256cf1 in G1BarrierSetC2::verify_pre_load(Node*, Unique_Node_List&) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:881 > #6 0x1461a325eec3 in G1BarrierSetC2::verify_gc_barriers(Compile*, BarrierSetC2::CompilePhase) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:1019 > #7 0x1461a325eec3 in G1BarrierSetC2::verify_gc_barriers(Compile*, BarrierSetC2::CompilePhase) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:963 > #8 0x1461a23160ed in Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*) /src/hotspot/share/opto/compile.cpp:875 > #9 0x1461a1845fd0 in C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) /src/hotspot/share/opto/c2compiler.cpp:142 > #10 0x1461a235ac39 in CompileBroker::invoke_compiler_on_method(CompileTask*) /src/hotspot/share/compiler/compileBroker.cpp:2305 > #11 0x1461a235ee4e in CompileBroker::compiler_thread_loop() /src/hotspot/share/compiler/compileBroker.cpp:1963 > #12 0x1461a4076f8d in JavaThread::thread_main_inner() /src/hotspot/share/runtime/javaThread.cpp:760 > #13 0x1461a409da23 in JavaThread::run() /src/hotspot/share/runtime/javaThread.cpp:745 > #14 0x1461a7b6d2bc in Thread::call_run() /src/hotspot/share/runtime/thread.cpp:221 > #15 0x1461a62a8105 in thread_native_entry /src/hotspot/os/linux/os_linux.cpp:846 > #16 0x1461c29801d9 in start_thread (/lib64/libpthread.so.0+0x81d9) > #17 0x1461c18cae72 in __clone (/lib64/libc.so.6+0x39e72) Good and trivial. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19125#pullrequestreview-2043815671 From kvn at openjdk.org Tue May 7 17:51:57 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 7 May 2024 17:51:57 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v11] In-Reply-To: <0anrYmEFTzUaEynG83xqh3DlAygkKXw9BTxO982PkR4=.7a8d0d3d-168e-47eb-8385-79d4a9c46df3@github.com> References: <0anrYmEFTzUaEynG83xqh3DlAygkKXw9BTxO982PkR4=.7a8d0d3d-168e-47eb-8385-79d4a9c46df3@github.com> Message-ID: On Tue, 7 May 2024 04:27:12 GMT, Thomas Stuefe wrote: >> See [1] for previous discussions. >> >> We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. >> >> The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. >> >> Examples: >> >> This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` >> >> This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` >> >> >> --- >> >> The patch: >> >> 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. >> 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. >> 3) Adapted and extended tests >> >> I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. >> >> >> Tested: >> >> - manually on Mac m1 (debug and release) >> - GHAs are running >> - but Oracle will do more testing before this goes in >> >> [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: > > - remove debug output > - Merge branch 'master' into compiler-default-limit > - fix compiler.c2.TestFindNode again > - merge master and fix conflicts > - Remove unused variable > - Remove accidental change to TestDeadPhiMergeMemLoop.java > - fix copyrights > - fix copyrights > - another fix > - fix accidental slip in of another test name > - ... and 9 more: https://git.openjdk.org/jdk/compare/f308e107...61dc5952 Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18969#pullrequestreview-2043822389 From shade at openjdk.org Tue May 7 17:53:52 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 7 May 2024 17:53:52 GMT Subject: RFR: 8331863: DUIterator_Fast used before it is constructed In-Reply-To: <5FEXRCspKpxNj3FJW1_2fqvdzuC40gTT8-SAG_pEflU=.69f4b39b-b7ed-4e45-87ac-2245ec75c789@github.com> References: <5FEXRCspKpxNj3FJW1_2fqvdzuC40gTT8-SAG_pEflU=.69f4b39b-b7ed-4e45-87ac-2245ec75c789@github.com> Message-ID: On Tue, 7 May 2024 16:13:38 GMT, Axel Boldt-Christmas wrote: > `SimpleDUIterator` constructs two `DUIterator_Fast` but passes a reference to the second when constructing the first. In debug values are read from this not yet constructed object. > > Found when building a debug build with UBSAN > > /src/hotspot/share/opto/node.cpp:124:8: runtime error: load of value 200, which is not a valid value for type 'bool' > #0 0x14619f4e6476 in DUIterator_Common::reset(DUIterator_Common const&) /src/hotspot/share/opto/node.cpp:124 > #1 0x1461a32556a5 in DUIterator_Fast::operator=(DUIterator_Fast const&) /src/hotspot/share/opto/node.hpp:1486 > #2 0x1461a32556a5 in Node::fast_outs(DUIterator_Fast&) const /src/hotspot/share/opto/node.hpp:1491 > #3 0x1461a32556a5 in SimpleDUIterator::SimpleDUIterator(Node*) /src/hotspot/share/opto/node.hpp:1575 > #4 0x1461a32556a5 in G1BarrierSetC2::has_cas_in_use_chain(Node*) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:855 > #5 0x1461a3256cf1 in G1BarrierSetC2::verify_pre_load(Node*, Unique_Node_List&) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:881 > #6 0x1461a325eec3 in G1BarrierSetC2::verify_gc_barriers(Compile*, BarrierSetC2::CompilePhase) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:1019 > #7 0x1461a325eec3 in G1BarrierSetC2::verify_gc_barriers(Compile*, BarrierSetC2::CompilePhase) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:963 > #8 0x1461a23160ed in Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*) /src/hotspot/share/opto/compile.cpp:875 > #9 0x1461a1845fd0 in C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) /src/hotspot/share/opto/c2compiler.cpp:142 > #10 0x1461a235ac39 in CompileBroker::invoke_compiler_on_method(CompileTask*) /src/hotspot/share/compiler/compileBroker.cpp:2305 > #11 0x1461a235ee4e in CompileBroker::compiler_thread_loop() /src/hotspot/share/compiler/compileBroker.cpp:1963 > #12 0x1461a4076f8d in JavaThread::thread_main_inner() /src/hotspot/share/runtime/javaThread.cpp:760 > #13 0x1461a409da23 in JavaThread::run() /src/hotspot/share/runtime/javaThread.cpp:745 > #14 0x1461a7b6d2bc in Thread::call_run() /src/hotspot/share/runtime/thread.cpp:221 > #15 0x1461a62a8105 in thread_native_entry /src/hotspot/os/linux/os_linux.cpp:846 > #16 0x1461c29801d9 in start_thread (/lib64/libpthread.so.0+0x81d9) > #17 0x1461c18cae72 in __clone (/lib64/libc.so.6+0x39e72) Ouch. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19125#pullrequestreview-2043825879 From kvn at openjdk.org Tue May 7 18:02:55 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 7 May 2024 18:02:55 GMT Subject: RFR: 8331764: C2 SuperWord: refactor _align_to_ref/_mem_ref_for_main_loop_alignment In-Reply-To: References: Message-ID: <0MHAO65YiEDNeD0RXunmaHh2sg14Czg16r19fxPm7Os=.77c88012-9d29-4857-8505-4bd6f8516dc4@github.com> On Tue, 7 May 2024 09:26:11 GMT, Emanuel Peter wrote: > This PR accomplishes these things: > - Rename `_align_to_ref` -> `_mem_ref_for_main_loop_alignment`. > - Move the `mem_ref` finding for alignment out of `SuperWord::find_adjacent_refs`. This is too early, and we don't even know if the relevant `mem_ref` is going to be vectorized. It makes more sense to pick a `mem_ref` directly in `SuperWord::adjust_pre_loop_limit_to_align_main_loop_vectors`, where we already know what packs are going to be vectorized. > - For the alignment width (aw), we can use the `vector_width` of the pack to which the `mem_ref` belongs, rather than the potentially much larger `vector_width_in_bytes`. I track this with `_aw_for_main_loop_alignment` now. > > I need this for https://github.com/openjdk/jdk/pull/18822, and decided to split it out into an independent change. src/hotspot/share/opto/superword.cpp line 3407: > 3405: if (first == nullptr) { continue; } > 3406: > 3407: int vw = first->memory_size() * pack->size(); I assume `first` is verified already and `first->memory_size()` is reasonable (size of primitive type). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19115#discussion_r1592872102 From sviswanathan at openjdk.org Tue May 7 18:24:03 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 7 May 2024 18:24:03 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Rearrange; add lambdas for clarity src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 383: > 381: { > 382: Label L_short; > 383: A comment here: // Broadcast the beginning of needle into a vector register. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 390: > 388: __ vpbroadcastb(byte_0, Address(needle, 0), Assembler::AVX_256bit); > 389: } > 390: A comment here: // Broadcast the end of needle into a vector register. This step is not needed for single element needle. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 418: > 416: __ cmpq(haystack_len, 0x10); > 417: __ ja_b(L_moreThan16); > 418: An assert here to check for header size >= 16 would be good. Also a comment here would he good, something like: // Copy 16 or 32 bytes prior to haystack end onto stack // This will possibly including some object header bytes when haystack length is less than 16 or 32 bytes // Set the new haystack address to beginning of copied haystack on stack adjusting for extra bytes copied src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 498: > 496: > 497: // big_case_loop_helper will fall through to this point if one or more potential matches are found > 498: // The mask will have a bitmask indicating the position of the potential matches within the haystack If no potential match, which label does the big_case_loop_helper jump to? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 517: > 515: __C2 arrays_equals(false, haystackStart, firstNeedleCompare, compLen, retval, rScratch, xmm_tmp3, xmm_tmp4, > 516: false /* char */, knoreg); > 517: __ testl(retval, retval); Since this is byte compare even for isU, the retval here could be a 64-bit quantity so the testl should be a testq. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 553: > 551: // Haystack always copied to stack, so 32-byte reads OK > 552: // Haystack length < 32 > 553: // 10 < needle length < 32 The comment below may need update as we come here for needle_len > OPT_NEEDLE_SIZE_MAX which is currently set as 5: // 10 < needle length < 32 src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 611: > 609: __C2 arrays_equals(false, rTmp, firstNeedleCompare, compLen, rTmp3, rTmp2, xmm_tmp3, xmm_tmp4, false /* char */, > 610: knoreg); > 611: __ testl(rTmp3, rTmp3); Since this is byte compare even for isU, the rtmp3 here could be a 64-bit quantity so the testl should be a testq. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 629: > 627: > 628: __ bind(L_returnError); > 629: __ movq(rbp, -1); This could directly be rax instead of intermediate rbp and then moving from rbp to rax. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 633: > 631: > 632: __ bind(L_returnZero); > 633: __ xorl(rbp, rbp); This could directly be rax instead of intermediate rbp and then moving from rbp to rax. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592791718 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592792401 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592774634 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592866631 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592868501 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592880650 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592885514 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592892211 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592892329 From sgibbons at openjdk.org Tue May 7 19:03:28 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 7 May 2024 19:03:28 GMT Subject: RFR: 8331033: EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 [v2] In-Reply-To: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> References: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> Message-ID: > Added a strcmp for unsafe_setmemory in process_call_arguments() so the assert would not trigger. > > I believe this is the correct fix as I do not think the arguments for setMemory need special handling like arraycopy. > > I would like suggestions on how to generate a testcase to catch this type of error in mainline. Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Add test for setMemory escape ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19032/files - new: https://git.openjdk.org/jdk/pull/19032/files/d6702fc3..e938e57c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19032&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19032&range=00-01 Stats: 114 lines in 1 file changed: 114 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19032.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19032/head:pull/19032 PR: https://git.openjdk.org/jdk/pull/19032 From sgibbons at openjdk.org Tue May 7 19:06:53 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 7 May 2024 19:06:53 GMT Subject: RFR: 8331033: EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 [v2] In-Reply-To: References: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> Message-ID: On Tue, 7 May 2024 19:03:28 GMT, Scott Gibbons wrote: >> Added a strcmp for unsafe_setmemory in process_call_arguments() so the assert would not trigger. >> >> I believe this is the correct fix as I do not think the arguments for setMemory need special handling like arraycopy. >> >> I would like suggestions on how to generate a testcase to catch this type of error in mainline. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Add test for setMemory escape Added testcase. Thanks @jatin-bhateja for help with the testcase. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19032#issuecomment-2099114433 From sviswanathan at openjdk.org Tue May 7 20:40:59 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 7 May 2024 20:40:59 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Rearrange; add lambdas for clarity src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 576: > 574: broadcast_additional_needles(false, 0 /* unknown */, NUMBER_OF_NEEDLE_BYTES_TO_COMPARE, needle, needleLen, rTmp3, > 575: isUU, isUL, _masm); > 576: Good to pass output xmm registers to this method. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 587: > 585: // firstNeedleCompare has address of second element of needle > 586: // compLen has length of comparison to do > 587: This is not clear. firstNeedleCompare gets needle + NUMBER_OF_NEEDLE_BYTES_TO_COMPARE - 1 which is not necessarily the second element of needle. If it helps let us fix the NUMBER_OF_NEEDLE_BYTES_TO_COMPARE to 3 and have comments and code versus that only. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 590: > 588: compare_haystack_to_needle(false, 0, NUMBER_OF_NEEDLE_BYTES_TO_COMPARE, L_returnRBP, haystack, isU, > 589: DO_EARLY_BAILOUT, mask, needleLen, rTmp3, _masm); > 590: It is better to pass the broadcasted xmm registers to compare_haystack_to_nedle. Basically pass input, output, and temps to all the methods. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 639: > 637: __ movl(rax, r8); > 638: __ subq(rcx, rbx); > 639: __ addq(rcx, rax); This could be: __ subq(rcx, rbx); __ addq(rcx, r8); src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 647: > 645: __ cmpq(r11, r10); > 646: __ movq(rbp, -1); > 647: __ cmovq(Assembler::belowEqual, rbp, r11); This could be directly computed in rax: __ movq(rax, -1); __ cmovq(Assembler::belowEqual, rax, r11); Also is it possible to not do cmov on some paths? It is an expensive operation. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1010: > 1008: static void broadcast_additional_needles(bool sizeKnown, int size, int bytesToCompare, Register needle, > 1009: Register needleLen, Register rTmp, bool isUU, bool isUL, > 1010: MacroAssembler *_masm) { Good to add output XMM registers to the parameter list. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1040: > 1038: __ vpbroadcastb(byte_1, Address(needle, 1), Assembler::AVX_256bit); > 1039: } > 1040: } It will be good to have a function which broadcasts a needle element from a given offset into a vector register. That function could take (needle address, offset, outout vector register, temps). Such a function could then be called twice from here and from main function for offset 0. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1593046499 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1593057834 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1593045710 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592989197 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592992225 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1593023349 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1593006539 From kvn at openjdk.org Tue May 7 21:10:52 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 7 May 2024 21:10:52 GMT Subject: RFR: 8331033: EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 [v2] In-Reply-To: References: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> Message-ID: <48sfm7TOlk9i8A2_WhISeaR0ETfBgCUZGfHalnDJqFY=.600c053a-40af-4b62-bf6f-ae3c8755b8db@github.com> On Tue, 7 May 2024 19:03:28 GMT, Scott Gibbons wrote: >> Added a strcmp for unsafe_setmemory in process_call_arguments() so the assert would not trigger. >> >> I believe this is the correct fix as I do not think the arguments for setMemory need special handling like arraycopy. >> >> I would like suggestions on how to generate a testcase to catch this type of error in mainline. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Add test for setMemory escape Few comments about test. test/hotspot/jtreg/compiler/escapeAnalysis/Test8331033.java line 2: > 1: /* > 2: * Copyright (c) 2020, Red Hat, Inc. All rights reserved. Suggestion: * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. test/hotspot/jtreg/compiler/escapeAnalysis/Test8331033.java line 28: > 26: * @bug 8331033 > 27: * @summary EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 > 28: * Suggestion: * @requires vm.compMode != "Xint" test/hotspot/jtreg/compiler/escapeAnalysis/Test8331033.java line 29: > 27: * @summary EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 > 28: * > 29: * @run main/othervm -XX:+PrintEscapeAnalysis -Xbatch -XX:-TieredCompilation Test8331033 Suggestion: * @run main/othervm -Xbatch -XX:-TieredCompilation Test8331033 test/hotspot/jtreg/compiler/escapeAnalysis/Test8331033.java line 56: > 54: * // "Escape Analysis for Java", Proceedings of ACM SIGPLAN > 55: * // OOPSLA Conference, November 1, 1999 > 56: */ No need for this comment. We have it in HotSpot sources, in `opto/escape.hpp`. ------------- PR Review: https://git.openjdk.org/jdk/pull/19032#pullrequestreview-2044189508 PR Review Comment: https://git.openjdk.org/jdk/pull/19032#discussion_r1593090598 PR Review Comment: https://git.openjdk.org/jdk/pull/19032#discussion_r1593092676 PR Review Comment: https://git.openjdk.org/jdk/pull/19032#discussion_r1593093151 PR Review Comment: https://git.openjdk.org/jdk/pull/19032#discussion_r1593094736 From sgibbons at openjdk.org Tue May 7 21:17:23 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 7 May 2024 21:17:23 GMT Subject: RFR: 8331033: EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 [v3] In-Reply-To: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> References: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> Message-ID: > Added a strcmp for unsafe_setmemory in process_call_arguments() so the assert would not trigger. > > I believe this is the correct fix as I do not think the arguments for setMemory need special handling like arraycopy. > > I would like suggestions on how to generate a testcase to catch this type of error in mainline. Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Review comments - change copyright, add @requires, change @run ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19032/files - new: https://git.openjdk.org/jdk/pull/19032/files/e938e57c..6c1bedf1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19032&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19032&range=01-02 Stats: 12 lines in 1 file changed: 1 ins; 9 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19032.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19032/head:pull/19032 PR: https://git.openjdk.org/jdk/pull/19032 From sgibbons at openjdk.org Tue May 7 21:17:24 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 7 May 2024 21:17:24 GMT Subject: RFR: 8331033: EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 [v3] In-Reply-To: References: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> Message-ID: On Tue, 7 May 2024 21:13:47 GMT, Scott Gibbons wrote: >> Added a strcmp for unsafe_setmemory in process_call_arguments() so the assert would not trigger. >> >> I believe this is the correct fix as I do not think the arguments for setMemory need special handling like arraycopy. >> >> I would like suggestions on how to generate a testcase to catch this type of error in mainline. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Review comments - change copyright, add @requires, change @run Addressed @vnkozlov review comments. ------------- PR Review: https://git.openjdk.org/jdk/pull/19032#pullrequestreview-2044214623 From sgibbons at openjdk.org Tue May 7 21:17:24 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 7 May 2024 21:17:24 GMT Subject: RFR: 8331033: EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 [v2] In-Reply-To: <48sfm7TOlk9i8A2_WhISeaR0ETfBgCUZGfHalnDJqFY=.600c053a-40af-4b62-bf6f-ae3c8755b8db@github.com> References: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> <48sfm7TOlk9i8A2_WhISeaR0ETfBgCUZGfHalnDJqFY=.600c053a-40af-4b62-bf6f-ae3c8755b8db@github.com> Message-ID: On Tue, 7 May 2024 21:04:45 GMT, Vladimir Kozlov wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Add test for setMemory escape > > test/hotspot/jtreg/compiler/escapeAnalysis/Test8331033.java line 28: > >> 26: * @bug 8331033 >> 27: * @summary EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 >> 28: * > > Suggestion: > > * @requires vm.compMode != "Xint" Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19032#discussion_r1593100622 From sgibbons at openjdk.org Tue May 7 21:29:52 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 7 May 2024 21:29:52 GMT Subject: RFR: 8331033: EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 [v3] In-Reply-To: References: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> Message-ID: On Tue, 7 May 2024 21:17:23 GMT, Scott Gibbons wrote: >> Added a strcmp for unsafe_setmemory in process_call_arguments() so the assert would not trigger. >> >> I believe this is the correct fix as I do not think the arguments for setMemory need special handling like arraycopy. >> >> I would like suggestions on how to generate a testcase to catch this type of error in mainline. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Review comments - change copyright, add @requires, change @run Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19032#issuecomment-2099338730 From kvn at openjdk.org Tue May 7 21:29:51 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 7 May 2024 21:29:51 GMT Subject: RFR: 8331033: EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 [v3] In-Reply-To: References: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> Message-ID: On Tue, 7 May 2024 21:17:23 GMT, Scott Gibbons wrote: >> Added a strcmp for unsafe_setmemory in process_call_arguments() so the assert would not trigger. >> >> I believe this is the correct fix as I do not think the arguments for setMemory need special handling like arraycopy. >> >> I would like suggestions on how to generate a testcase to catch this type of error in mainline. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Review comments - change copyright, add @requires, change @run Good. I submitted testing to make sure the test passed with different flags combinations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19032#issuecomment-2099337542 From sviswanathan at openjdk.org Wed May 8 00:26:59 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 8 May 2024 00:26:59 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Rearrange; add lambdas for clarity src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1082: > 1080: // noMatch - label bound outside to jump to if there is no match > 1081: // haystack - the address of the first byte of the haystack > 1082: // hsLen - the sizeof the haystack Good to specify if the size (size of needle) and hsLen (size of haystack) is in bytes or elements. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1149: > 1147: > 1148: if (size == (isU ? 2 : 1)) { > 1149: __ vpmovmskb(eq_mask, cmp_0, Assembler::AVX_256bit); vpmovmskb is being done twice if doEarlyBailout is set to 1 (the setting we have currently). If it helps to simplify, we could assume that doEarlyBailout is always set to 1 and remove this configurability. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1174: > 1172: #define lastMask rTmp > 1173: __ vpmovmskb(lastMask, cmp_k, Assembler::AVX_256bit); > 1174: __ shrq(lastMask); did you mean to shift the lastMask by shiftVal here? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1185: > 1183: if (size > (isU ? 4 : 2)) { > 1184: if (doEarlyBailout) { > 1185: __ testl(eq_mask, eq_mask); The masks are 32 bit as we are comparing max 32 byes (256 bits) at a time. So we could consistently do either andl, testl, shrl or andq, testq, shrq. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1593225178 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1593225488 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1593227487 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1593229554 From galder at openjdk.org Wed May 8 04:34:18 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 8 May 2024 04:34:18 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v15] In-Reply-To: References: Message-ID: > Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. > > The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: > > > $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op > ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op > ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op > ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op > ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op > ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op > ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op > ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op > ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op > ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op > ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op > ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op > ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op > ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op > ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op > > > It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. > > I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. > > > $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > ... > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 > > > One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? > > Thanks @rwestrel for his help shaping this up :) Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: Fix assert to only have a single ! ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17667/files - new: https://git.openjdk.org/jdk/pull/17667/files/306db745..a35cdd84 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17667&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17667&range=13-14 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17667.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17667/head:pull/17667 PR: https://git.openjdk.org/jdk/pull/17667 From galder at openjdk.org Wed May 8 04:34:18 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 8 May 2024 04:34:18 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v13] In-Reply-To: References: <9Eoh8hOSSVvAtf9iVQ6hflQyceUtt4dpZdqm61zg5XI=.358a4d79-70d9-4b54-85d5-37c6817f0fae@github.com> <_x-OSownzQQZ8fmlsbvQ42MLf9BGZskECTNncOE0s4E=.8381a076-0cc4-4339-924f-fa22ca780573@github.com> Message-ID: On Tue, 7 May 2024 17:11:37 GMT, Galder Zamarre?o wrote: >> An array of interfaces can be exact: >> >> new Interface[20].getClasss(); >> >> and it seems like it would be safe to allow this, so I think we only need one assert for `!type->as_instance_klass()->is_interface()` if we don't trust the result of exact_type(). > > @dean-long @rwestrel I've added the assert. The assert doesn't hold, e.g. === Output from failing command(s) repeated here === * For target buildtools_create_symbols_javac__the.COMPILE_CREATE_SYMBOLS_batch: # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/runner/work/jdk/jdk/src/hotspot/share/c1/c1_GraphBuilder.cpp:2031), pid=75212, tid=75244 # Error: assert(!!type->as_instance_klass()->is_interface()) failed # # JRE version: OpenJDK Runtime Environment (23.0) (fastdebug build 23-internal-galderz-306db7459b1316251e36d0eccc3035d11db44889) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 23-internal-galderz-306db7459b1316251e36d0eccc3035d11db44889, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x777cd0] GraphBuilder::invoke(Bytecodes::Code)+0x1200 Thoughts @rwestrel @dean-long? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1593369537 From galder at openjdk.org Wed May 8 04:34:18 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 8 May 2024 04:34:18 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v13] In-Reply-To: References: <9Eoh8hOSSVvAtf9iVQ6hflQyceUtt4dpZdqm61zg5XI=.358a4d79-70d9-4b54-85d5-37c6817f0fae@github.com> <_x-OSownzQQZ8fmlsbvQ42MLf9BGZskECTNncOE0s4E=.8381a076-0cc4-4339-924f-fa22ca780573@github.com> Message-ID: On Wed, 8 May 2024 04:29:59 GMT, Galder Zamarre?o wrote: >> @dean-long @rwestrel I've added the assert. > > The assert doesn't hold, e.g. > > > === Output from failing command(s) repeated here === > * For target buildtools_create_symbols_javac__the.COMPILE_CREATE_SYMBOLS_batch: > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/home/runner/work/jdk/jdk/src/hotspot/share/c1/c1_GraphBuilder.cpp:2031), pid=75212, tid=75244 > # Error: assert(!!type->as_instance_klass()->is_interface()) failed > # > # JRE version: OpenJDK Runtime Environment (23.0) (fastdebug build 23-internal-galderz-306db7459b1316251e36d0eccc3035d11db44889) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 23-internal-galderz-306db7459b1316251e36d0eccc3035d11db44889, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0x777cd0] GraphBuilder::invoke(Bytecodes::Code)+0x1200 > > > Thoughts @rwestrel @dean-long? Hmmm, the double `!!`... let me fix that and see. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1593369973 From epeter at openjdk.org Wed May 8 04:40:57 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 May 2024 04:40:57 GMT Subject: RFR: 8331764: C2 SuperWord: refactor _align_to_ref/_mem_ref_for_main_loop_alignment In-Reply-To: <0MHAO65YiEDNeD0RXunmaHh2sg14Czg16r19fxPm7Os=.77c88012-9d29-4857-8505-4bd6f8516dc4@github.com> References: <0MHAO65YiEDNeD0RXunmaHh2sg14Czg16r19fxPm7Os=.77c88012-9d29-4857-8505-4bd6f8516dc4@github.com> Message-ID: On Tue, 7 May 2024 18:00:16 GMT, Vladimir Kozlov wrote: >> This PR accomplishes these things: >> - Rename `_align_to_ref` -> `_mem_ref_for_main_loop_alignment`. >> - Move the `mem_ref` finding for alignment out of `SuperWord::find_adjacent_refs`. This is too early, and we don't even know if the relevant `mem_ref` is going to be vectorized. It makes more sense to pick a `mem_ref` directly in `SuperWord::adjust_pre_loop_limit_to_align_main_loop_vectors`, where we already know what packs are going to be vectorized. >> - For the alignment width (aw), we can use the `vector_width` of the pack to which the `mem_ref` belongs, rather than the potentially much larger `vector_width_in_bytes`. I track this with `_aw_for_main_loop_alignment` now. >> >> I need this for https://github.com/openjdk/jdk/pull/18822, and decided to split it out into an independent change. > > src/hotspot/share/opto/superword.cpp line 3407: > >> 3405: if (first == nullptr) { continue; } >> 3406: >> 3407: int vw = first->memory_size() * pack->size(); > > I assume `first` is verified already and `first->memory_size()` is reasonable (size of primitive type). Yes, it is. All of this code is run in `SuperWord::output`, and at this point we are committed to vectorization - everything is verified. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19115#discussion_r1593373458 From epeter at openjdk.org Wed May 8 04:46:52 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 May 2024 04:46:52 GMT Subject: RFR: 8331764: C2 SuperWord: refactor _align_to_ref/_mem_ref_for_main_loop_alignment In-Reply-To: References: <0MHAO65YiEDNeD0RXunmaHh2sg14Czg16r19fxPm7Os=.77c88012-9d29-4857-8505-4bd6f8516dc4@github.com> Message-ID: On Wed, 8 May 2024 04:38:06 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/superword.cpp line 3407: >> >>> 3405: if (first == nullptr) { continue; } >>> 3406: >>> 3407: int vw = first->memory_size() * pack->size(); >> >> I assume `first` is verified already and `first->memory_size()` is reasonable (size of primitive type). > > Yes, it is. All of this code is run in `SuperWord::output`, and at this point we are committed to vectorization - everything is verified. That is what I tried to say in the PR description: > It makes more sense to pick a mem_ref directly in SuperWord::adjust_pre_loop_limit_to_align_main_loop_vectors, where we already know what packs are going to be vectorized. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19115#discussion_r1593376793 From epeter at openjdk.org Wed May 8 04:46:53 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 May 2024 04:46:53 GMT Subject: RFR: 8331764: C2 SuperWord: refactor _align_to_ref/_mem_ref_for_main_loop_alignment In-Reply-To: References: <0MHAO65YiEDNeD0RXunmaHh2sg14Czg16r19fxPm7Os=.77c88012-9d29-4857-8505-4bd6f8516dc4@github.com> Message-ID: On Wed, 8 May 2024 04:43:46 GMT, Emanuel Peter wrote: >> Yes, it is. All of this code is run in `SuperWord::output`, and at this point we are committed to vectorization - everything is verified. > > That is what I tried to say in the PR description: >> It makes more sense to pick a mem_ref directly in SuperWord::adjust_pre_loop_limit_to_align_main_loop_vectors, where we already know what packs are going to be vectorized. Yes, `first->memory_size()` knows the size in bytes of the load/store. It is used many places in SuperWord. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19115#discussion_r1593377294 From aboldtch at openjdk.org Wed May 8 05:05:58 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 8 May 2024 05:05:58 GMT Subject: RFR: 8331863: DUIterator_Fast used before it is constructed In-Reply-To: <5FEXRCspKpxNj3FJW1_2fqvdzuC40gTT8-SAG_pEflU=.69f4b39b-b7ed-4e45-87ac-2245ec75c789@github.com> References: <5FEXRCspKpxNj3FJW1_2fqvdzuC40gTT8-SAG_pEflU=.69f4b39b-b7ed-4e45-87ac-2245ec75c789@github.com> Message-ID: On Tue, 7 May 2024 16:13:38 GMT, Axel Boldt-Christmas wrote: > `SimpleDUIterator` constructs two `DUIterator_Fast` but passes a reference to the second when constructing the first. In debug values are read from this not yet constructed object. > > Found when building a debug build with UBSAN > > /src/hotspot/share/opto/node.cpp:124:8: runtime error: load of value 200, which is not a valid value for type 'bool' > #0 0x14619f4e6476 in DUIterator_Common::reset(DUIterator_Common const&) /src/hotspot/share/opto/node.cpp:124 > #1 0x1461a32556a5 in DUIterator_Fast::operator=(DUIterator_Fast const&) /src/hotspot/share/opto/node.hpp:1486 > #2 0x1461a32556a5 in Node::fast_outs(DUIterator_Fast&) const /src/hotspot/share/opto/node.hpp:1491 > #3 0x1461a32556a5 in SimpleDUIterator::SimpleDUIterator(Node*) /src/hotspot/share/opto/node.hpp:1575 > #4 0x1461a32556a5 in G1BarrierSetC2::has_cas_in_use_chain(Node*) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:855 > #5 0x1461a3256cf1 in G1BarrierSetC2::verify_pre_load(Node*, Unique_Node_List&) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:881 > #6 0x1461a325eec3 in G1BarrierSetC2::verify_gc_barriers(Compile*, BarrierSetC2::CompilePhase) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:1019 > #7 0x1461a325eec3 in G1BarrierSetC2::verify_gc_barriers(Compile*, BarrierSetC2::CompilePhase) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:963 > #8 0x1461a23160ed in Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*) /src/hotspot/share/opto/compile.cpp:875 > #9 0x1461a1845fd0 in C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) /src/hotspot/share/opto/c2compiler.cpp:142 > #10 0x1461a235ac39 in CompileBroker::invoke_compiler_on_method(CompileTask*) /src/hotspot/share/compiler/compileBroker.cpp:2305 > #11 0x1461a235ee4e in CompileBroker::compiler_thread_loop() /src/hotspot/share/compiler/compileBroker.cpp:1963 > #12 0x1461a4076f8d in JavaThread::thread_main_inner() /src/hotspot/share/runtime/javaThread.cpp:760 > #13 0x1461a409da23 in JavaThread::run() /src/hotspot/share/runtime/javaThread.cpp:745 > #14 0x1461a7b6d2bc in Thread::call_run() /src/hotspot/share/runtime/thread.cpp:221 > #15 0x1461a62a8105 in thread_native_entry /src/hotspot/os/linux/os_linux.cpp:846 > #16 0x1461c29801d9 in start_thread (/lib64/libpthread.so.0+0x81d9) > #17 0x1461c18cae72 in __clone (/lib64/libc.so.6+0x39e72) Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19125#issuecomment-2099745751 From aboldtch at openjdk.org Wed May 8 05:05:58 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 8 May 2024 05:05:58 GMT Subject: Integrated: 8331863: DUIterator_Fast used before it is constructed In-Reply-To: <5FEXRCspKpxNj3FJW1_2fqvdzuC40gTT8-SAG_pEflU=.69f4b39b-b7ed-4e45-87ac-2245ec75c789@github.com> References: <5FEXRCspKpxNj3FJW1_2fqvdzuC40gTT8-SAG_pEflU=.69f4b39b-b7ed-4e45-87ac-2245ec75c789@github.com> Message-ID: On Tue, 7 May 2024 16:13:38 GMT, Axel Boldt-Christmas wrote: > `SimpleDUIterator` constructs two `DUIterator_Fast` but passes a reference to the second when constructing the first. In debug values are read from this not yet constructed object. > > Found when building a debug build with UBSAN > > /src/hotspot/share/opto/node.cpp:124:8: runtime error: load of value 200, which is not a valid value for type 'bool' > #0 0x14619f4e6476 in DUIterator_Common::reset(DUIterator_Common const&) /src/hotspot/share/opto/node.cpp:124 > #1 0x1461a32556a5 in DUIterator_Fast::operator=(DUIterator_Fast const&) /src/hotspot/share/opto/node.hpp:1486 > #2 0x1461a32556a5 in Node::fast_outs(DUIterator_Fast&) const /src/hotspot/share/opto/node.hpp:1491 > #3 0x1461a32556a5 in SimpleDUIterator::SimpleDUIterator(Node*) /src/hotspot/share/opto/node.hpp:1575 > #4 0x1461a32556a5 in G1BarrierSetC2::has_cas_in_use_chain(Node*) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:855 > #5 0x1461a3256cf1 in G1BarrierSetC2::verify_pre_load(Node*, Unique_Node_List&) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:881 > #6 0x1461a325eec3 in G1BarrierSetC2::verify_gc_barriers(Compile*, BarrierSetC2::CompilePhase) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:1019 > #7 0x1461a325eec3 in G1BarrierSetC2::verify_gc_barriers(Compile*, BarrierSetC2::CompilePhase) const /src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp:963 > #8 0x1461a23160ed in Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*) /src/hotspot/share/opto/compile.cpp:875 > #9 0x1461a1845fd0 in C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) /src/hotspot/share/opto/c2compiler.cpp:142 > #10 0x1461a235ac39 in CompileBroker::invoke_compiler_on_method(CompileTask*) /src/hotspot/share/compiler/compileBroker.cpp:2305 > #11 0x1461a235ee4e in CompileBroker::compiler_thread_loop() /src/hotspot/share/compiler/compileBroker.cpp:1963 > #12 0x1461a4076f8d in JavaThread::thread_main_inner() /src/hotspot/share/runtime/javaThread.cpp:760 > #13 0x1461a409da23 in JavaThread::run() /src/hotspot/share/runtime/javaThread.cpp:745 > #14 0x1461a7b6d2bc in Thread::call_run() /src/hotspot/share/runtime/thread.cpp:221 > #15 0x1461a62a8105 in thread_native_entry /src/hotspot/os/linux/os_linux.cpp:846 > #16 0x1461c29801d9 in start_thread (/lib64/libpthread.so.0+0x81d9) > #17 0x1461c18cae72 in __clone (/lib64/libc.so.6+0x39e72) This pull request has now been integrated. Changeset: 466a21d8 Author: Axel Boldt-Christmas URL: https://git.openjdk.org/jdk/commit/466a21d8646c05d91f29d607c6347afd34c75629 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod 8331863: DUIterator_Fast used before it is constructed Reviewed-by: kvn, shade ------------- PR: https://git.openjdk.org/jdk/pull/19125 From chagedorn at openjdk.org Wed May 8 07:12:52 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 8 May 2024 07:12:52 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop In-Reply-To: <2lreoMy7UKtgM_m8RCU68rp3FFkoU8zj3ckuTKzXqf0=.dc02a0d4-2671-4c70-a470-a64f28e38f2d@github.com> References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> <2lreoMy7UKtgM_m8RCU68rp3FFkoU8zj3ckuTKzXqf0=.dc02a0d4-2671-4c70-a470-a64f28e38f2d@github.com> Message-ID: On Tue, 7 May 2024 16:47:47 GMT, Emanuel Peter wrote: >> In the test case: >> >> >> long i; >> for (; i > 0; i--) { >> res += 42 / ((int) i); >> >> >> The long counted loop phi has type `[1..100]`. As a consequence, the >> `ConvL2I` also has type `[1..100]`. The `DivI` node that follows can't >> fault: it is not guarded by a zero check and has no control set. >> >> The `ConvL2I` is split through phi and so is the `DiVI` node: >> `PhaseIdealLoop::cannot_split_division()` returns true because the >> value coming from the backedge into the `DivI` (when it is about to be >> split thru phi) is the result of the `ConvL2I` which has type >> `[1..100`] so is not zero as far as the compiler can tell. >> >> On the last iteration of the loop, i is 1. Because the DivI was split >> thru Phi, it computes the value for the following iteration, so for i >> = 0. This causes a crash when the compiled code runs. >> >> The same problem can't happen with an int counted loop because logic >> in `PhaseIdealLoop::split_thru_phi()` prevents a `ConvI2L` from being >> split thru phi. I propose to fix this the same way: in the test case, >> it's not true that once the `ConvL2I` is split thru phi it keeps type >> `[1..100]`. The fix is fairly conservative because it's base on the >> existing logic for `ConvI2L`: we would want to not split a `ConvL2I` >> only a counted loopd but. I suppose the same is true for the `ConvI2L` >> and I thought it would be best to revisit both together. > > test/hotspot/jtreg/compiler/splitif/TestLongCountedLoopConvL2I.java line 31: > >> 29: * -XX:+StressGCM -XX:StressSeed=92643864 TestLongCountedLoopConvL2I >> 30: * @run main/othervm -XX:-BackgroundCompilation -XX:-TieredCompilation -XX:-UseOnStackReplacement >> 31: * -XX:+StressGCM TestLongCountedLoopConvL2I > > Would it make sense to have a run that allows OSR? You should also add `-XX:+UnlockDiagnosticVMOptions` for the stress flag. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19086#discussion_r1593501246 From mli at openjdk.org Wed May 8 08:46:01 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 8 May 2024 08:46:01 GMT Subject: RFR: 8331908: Simplify log code in vectorintrinsics.cpp Message-ID: Hi, Can you help to review this simple patch? Curretly, log code in vectorintrinsics.cpp is a bit redundant, could be simplified a bit. Thanks. ## Test sanity test, jdk/incubator/vector ------------- Commit messages: - Initial commit Changes: https://git.openjdk.org/jdk/pull/19135/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19135&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331908 Stats: 497 lines in 1 file changed: 14 ins; 322 del; 161 mod Patch: https://git.openjdk.org/jdk/pull/19135.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19135/head:pull/19135 PR: https://git.openjdk.org/jdk/pull/19135 From galder at openjdk.org Wed May 8 09:23:57 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 8 May 2024 09:23:57 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v13] In-Reply-To: References: <9Eoh8hOSSVvAtf9iVQ6hflQyceUtt4dpZdqm61zg5XI=.358a4d79-70d9-4b54-85d5-37c6817f0fae@github.com> <_x-OSownzQQZ8fmlsbvQ42MLf9BGZskECTNncOE0s4E=.8381a076-0cc4-4339-924f-fa22ca780573@github.com> Message-ID: On Wed, 8 May 2024 04:31:05 GMT, Galder Zamarre?o wrote: >> The assert doesn't hold, e.g. >> >> >> === Output from failing command(s) repeated here === >> * For target buildtools_create_symbols_javac__the.COMPILE_CREATE_SYMBOLS_batch: >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (/home/runner/work/jdk/jdk/src/hotspot/share/c1/c1_GraphBuilder.cpp:2031), pid=75212, tid=75244 >> # Error: assert(!!type->as_instance_klass()->is_interface()) failed >> # >> # JRE version: OpenJDK Runtime Environment (23.0) (fastdebug build 23-internal-galderz-306db7459b1316251e36d0eccc3035d11db44889) >> # Java VM: OpenJDK 64-Bit Server VM (fastdebug 23-internal-galderz-306db7459b1316251e36d0eccc3035d11db44889, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) >> # Problematic frame: >> # V [libjvm.so+0x777cd0] GraphBuilder::invoke(Bytecodes::Code)+0x1200 >> >> >> Thoughts @rwestrel @dean-long? > > Hmmm, the double `!!`... let me fix that and see. Hmmm, something else is failing now. That's odd, maybe master has updated and is causing this PR to fail now? # Internal Error (/Users/runner/work/jdk/jdk/src/hotspot/share/ci/ciMetadata.hpp:88), pid=79328, tid=27395 # assert(is_instance_klass()) failed: bad cast I will look into it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1593714548 From galder at openjdk.org Wed May 8 09:23:57 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 8 May 2024 09:23:57 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v13] In-Reply-To: References: <9Eoh8hOSSVvAtf9iVQ6hflQyceUtt4dpZdqm61zg5XI=.358a4d79-70d9-4b54-85d5-37c6817f0fae@github.com> <_x-OSownzQQZ8fmlsbvQ42MLf9BGZskECTNncOE0s4E=.8381a076-0cc4-4339-924f-fa22ca780573@github.com> Message-ID: On Wed, 8 May 2024 09:18:59 GMT, Galder Zamarre?o wrote: >> Hmmm, the double `!!`... let me fix that and see. > > Hmmm, something else is failing now. That's odd, maybe master has updated and is causing this PR to fail now? > > > # Internal Error (/Users/runner/work/jdk/jdk/src/hotspot/share/ci/ciMetadata.hpp:88), pid=79328, tid=27395 > # assert(is_instance_klass()) failed: bad cast > > > I will look into it. Ah no, that assert comes from `type->as_instance_klass()` call: ciInstanceKlass* as_instance_klass() { assert(is_instance_klass(), "bad cast"); return (ciInstanceKlass*)this; } @rwestrel @dean-long what shall we do here? Do we remove the assert altogether? Does the code need to change for the assert to pass? Any other ideas? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1593717682 From dlong at openjdk.org Wed May 8 10:06:58 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 8 May 2024 10:06:58 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v15] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 04:34:18 GMT, Galder Zamarre?o wrote: >> Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. >> >> The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: >> >> >> $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op >> ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op >> ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op >> ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op >> ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op >> ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op >> ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op >> ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op >> ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op >> ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op >> ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op >> ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op >> >> >> It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. >> >> I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. >> >> >> $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> ... >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 >> >> >> One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? >> >>... > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Fix assert to only have a single ! src/hotspot/share/c1/c1_GraphBuilder.cpp line 2031: > 2029: ciType* type = receiver->exact_type(); > 2030: if (type != nullptr && type->is_loaded()) { > 2031: assert(!type->as_instance_klass()->is_interface(), ""); Suggestion: assert(!type->is_instance_klass() || !type->as_instance_klass()->is_interface(), ""); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1593776586 From stuefe at openjdk.org Wed May 8 10:40:59 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 8 May 2024 10:40:59 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v10] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 18:29:20 GMT, Vladimir Kozlov wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> fix compiler.c2.TestFindNode again > > `-XX:CompileCommand=memstat,compiler.c2.TestFindNode::*,print` - leftover from debugging? Many thanks, @vnkozlov ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2100279602 From stuefe at openjdk.org Wed May 8 10:41:00 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 8 May 2024 10:41:00 GMT Subject: Integrated: 8331185: Enable compiler memory limits in debug builds In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 10:04:28 GMT, Thomas Stuefe wrote: > See [1] for previous discussions. > > We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. > > The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. > > Examples: > > This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` > > This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` > > > --- > > The patch: > > 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. > 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. > 3) Adapted and extended tests > > I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. > > > Tested: > > - manually on Mac m1 (debug and release) > - GHAs are running > - but Oracle will do more testing before this goes in > > [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html This pull request has now been integrated. Changeset: ad78b7fa Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/ad78b7fa67ba30cab2e8f496e4c765be15deeca6 Stats: 166 lines in 7 files changed: 115 ins; 12 del; 39 mod 8331185: Enable compiler memory limits in debug builds Reviewed-by: asmehra, kvn ------------- PR: https://git.openjdk.org/jdk/pull/18969 From aph at openjdk.org Wed May 8 11:25:03 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 8 May 2024 11:25:03 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v8] In-Reply-To: References: Message-ID: <_7zY3gHXEP48uBLgwzxz8wYqv_97zMuIgqcxKBTGDCg=.5e185cd6-22c4-4922-a00c-afeb35799e6b@github.com> On Fri, 26 Apr 2024 12:52:15 GMT, Bhavana Kilambi wrote: >> Floating-point addition is non-associative, that is adding floating-point elements in arbitrary order may get different value. Specially, Vector API does not define the order of reduction intentionally, which allows platforms to generate more efficient codes [1]. So that needs a node to represent non strictly-ordered add-reduction for floating-point type in C2. >> >> To avoid introducing new nodes, this patch adds a bool field in `AddReductionVF/D` to distinguish whether they require strict order. It also removes `UnorderedReductionNode` and adds a virtual function `bool requires_strict_order()` in `ReductionNode`. Besides `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` have a fixed value. >> >> With this patch, Vector API would always generate non strictly-ordered `AddReductionVF/D' on SVE machines with vector length <= 16B as it is more beneficial to generate non-strictly ordered instructions on such machines compared to strictly ordered ones. >> >> [AArch64] >> On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. Auto-vectorization has already banned these nodes in JDK-8275275 [2]. >> >> This patch adds matching rules for non strictly-ordered `AddReductionVF/D`. >> >> No effects on other platforms. >> >> [Performance] >> FloatMaxVector.ADDLanes [3] measures the performance of add reduction for floating-point type. With this patch, it improves ~3x on my SVE machine (128-bit). >> >> ADDLanes >> >> Benchmark Before After Unit >> FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms >> >> >> Final code is as below: >> >> Before: >> ` fadda z17.s, p7/m, z17.s, z16.s >> ` >> After: >> >> faddp v17.4s, v21.4s, v21.4s >> faddp s18, v17.2s >> fadd s18, s18, s19 >> >> >> >> >> [Test] >> Full jtreg passed on AArch64 and x86. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2529 >> [2] https://bugs.openjdk.org/browse/JDK-8275275 >> [3] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L316 > > Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge master > - Adjust format for the backend rules changed in previous commit > - Address some more review comments > - Revert to previous indentation > - Add comments, revert to requires_strict_order and other minor changes > - Naming changes: replace strict/non-strict with more technical terms > - Addressed review comments for changes in backend rules and code style > - 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction > > Floating-point addition is non-associative, that is adding > floating-point elements in arbitrary order may get different value. > Specially, Vector API does not define the order of reduction > intentionally, which allows platforms to generate more efficient codes > [1]. So that needs a node to represent non strictly-ordered > add-reduction for floating-point type in C2. > > To avoid introducing new nodes, this patch adds a bool field in > `AddReductionVF/D` to distinguish whether they require strict order. It > also removes `UnorderedReductionNode` and adds a virtual function > `bool requires_strict_order()` in `ReductionNode`. Besides > `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` > have a fixed value. > > With this patch, Vector API would always generate non strictly-ordered > `AddReductionVF/D' on SVE machines with vector length <= 16B as it is > more beneficial to generate non-strictly ordered instructions on such > machines compared to strictly ordered ones. > > [AArch64] > On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. > Auto-vectorization has already banned these nodes in JDK-8275275 [2]. > > This patch adds matching rules for non strictly-ordered > `AddReductionVF/D`. > > No effects on other platforms. > > [Performance] > FloatMaxVector.ADDLanes [3] measures the performance of add reduction > for floating-point type. With this patch, it improves ~3x on my SVE > machine (128-bit). > > ADDLanes > Benchmark Before After Unit > FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms > > Final code is as below: > > ``` > Before: > fadda z17.s, p7/m, z17.s, z16.s > > After: > faddp v17.4s, v21.4s,... src/hotspot/cpu/aarch64/aarch64_vector.ad line 140: > 138: // The implementations of Op_AddReductionVD/F in Neon are for the Vector API only. > 139: // They are not suitable for auto-vectorization because the implementations cannot > 140: // guarantee strict ordering. Suggestion: // These implementations of Op_AddReductionVD/F in Neon are for the Vector API only. // They are not suitable for auto-vectorization because the result would not conform to the // JLS, Section Evaluation Order. src/hotspot/cpu/aarch64/aarch64_vector.ad line 2865: > 2863: // Non-strictly ordered floating-point add reduction for vector length of 64-bit. As an > 2864: // example, this rule can be reached from the VectorAPI (which allows for non-strictly ordered > 2865: // add reduction). Suggestion: // Non-strictly ordered floating-point add reduction for a 64-bits-long vector. This rule // is intended for the VectorAPI (which allows for non-strictly ordered add reduction). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1593863910 PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1593866102 From aph at openjdk.org Wed May 8 11:25:04 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 8 May 2024 11:25:04 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v8] In-Reply-To: <_7zY3gHXEP48uBLgwzxz8wYqv_97zMuIgqcxKBTGDCg=.5e185cd6-22c4-4922-a00c-afeb35799e6b@github.com> References: <_7zY3gHXEP48uBLgwzxz8wYqv_97zMuIgqcxKBTGDCg=.5e185cd6-22c4-4922-a00c-afeb35799e6b@github.com> Message-ID: On Wed, 8 May 2024 11:20:50 GMT, Andrew Haley wrote: >> Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - Merge master >> - Adjust format for the backend rules changed in previous commit >> - Address some more review comments >> - Revert to previous indentation >> - Add comments, revert to requires_strict_order and other minor changes >> - Naming changes: replace strict/non-strict with more technical terms >> - Addressed review comments for changes in backend rules and code style >> - 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction >> >> Floating-point addition is non-associative, that is adding >> floating-point elements in arbitrary order may get different value. >> Specially, Vector API does not define the order of reduction >> intentionally, which allows platforms to generate more efficient codes >> [1]. So that needs a node to represent non strictly-ordered >> add-reduction for floating-point type in C2. >> >> To avoid introducing new nodes, this patch adds a bool field in >> `AddReductionVF/D` to distinguish whether they require strict order. It >> also removes `UnorderedReductionNode` and adds a virtual function >> `bool requires_strict_order()` in `ReductionNode`. Besides >> `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` >> have a fixed value. >> >> With this patch, Vector API would always generate non strictly-ordered >> `AddReductionVF/D' on SVE machines with vector length <= 16B as it is >> more beneficial to generate non-strictly ordered instructions on such >> machines compared to strictly ordered ones. >> >> [AArch64] >> On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. >> Auto-vectorization has already banned these nodes in JDK-8275275 [2]. >> >> This patch adds matching rules for non strictly-ordered >> `AddReductionVF/D`. >> >> No effects on other platforms. >> >> [Performance] >> FloatMaxVector.ADDLanes [3] measures the performance of add reduction >> for floating-point type. With this patch, it improves ~3x on my SVE >> machine (128-bit). >> >> ADDLanes >> Benchmark Before After Unit >> FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms >> >> Final code is as below: >> >> ``` >> Before:... > > src/hotspot/cpu/aarch64/aarch64_vector.ad line 2865: > >> 2863: // Non-strictly ordered floating-point add reduction for vector length of 64-bit. As an >> 2864: // example, this rule can be reached from the VectorAPI (which allows for non-strictly ordered >> 2865: // add reduction). > > Suggestion: > > // Non-strictly ordered floating-point add reduction for a 64-bits-long vector. This rule > // is intended for the VectorAPI (which allows for non-strictly ordered add reduction). Please repeat this change everywhere. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1593867651 From aph at openjdk.org Wed May 8 11:28:00 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 8 May 2024 11:28:00 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v8] In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 12:52:15 GMT, Bhavana Kilambi wrote: >> Floating-point addition is non-associative, that is adding floating-point elements in arbitrary order may get different value. Specially, Vector API does not define the order of reduction intentionally, which allows platforms to generate more efficient codes [1]. So that needs a node to represent non strictly-ordered add-reduction for floating-point type in C2. >> >> To avoid introducing new nodes, this patch adds a bool field in `AddReductionVF/D` to distinguish whether they require strict order. It also removes `UnorderedReductionNode` and adds a virtual function `bool requires_strict_order()` in `ReductionNode`. Besides `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` have a fixed value. >> >> With this patch, Vector API would always generate non strictly-ordered `AddReductionVF/D' on SVE machines with vector length <= 16B as it is more beneficial to generate non-strictly ordered instructions on such machines compared to strictly ordered ones. >> >> [AArch64] >> On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. Auto-vectorization has already banned these nodes in JDK-8275275 [2]. >> >> This patch adds matching rules for non strictly-ordered `AddReductionVF/D`. >> >> No effects on other platforms. >> >> [Performance] >> FloatMaxVector.ADDLanes [3] measures the performance of add reduction for floating-point type. With this patch, it improves ~3x on my SVE machine (128-bit). >> >> ADDLanes >> >> Benchmark Before After Unit >> FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms >> >> >> Final code is as below: >> >> Before: >> ` fadda z17.s, p7/m, z17.s, z16.s >> ` >> After: >> >> faddp v17.4s, v21.4s, v21.4s >> faddp s18, v17.2s >> fadd s18, s18, s19 >> >> >> >> >> [Test] >> Full jtreg passed on AArch64 and x86. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2529 >> [2] https://bugs.openjdk.org/browse/JDK-8275275 >> [3] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L316 > > Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge master > - Adjust format for the backend rules changed in previous commit > - Address some more review comments > - Revert to previous indentation > - Add comments, revert to requires_strict_order and other minor changes > - Naming changes: replace strict/non-strict with more technical terms > - Addressed review comments for changes in backend rules and code style > - 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction > > Floating-point addition is non-associative, that is adding > floating-point elements in arbitrary order may get different value. > Specially, Vector API does not define the order of reduction > intentionally, which allows platforms to generate more efficient codes > [1]. So that needs a node to represent non strictly-ordered > add-reduction for floating-point type in C2. > > To avoid introducing new nodes, this patch adds a bool field in > `AddReductionVF/D` to distinguish whether they require strict order. It > also removes `UnorderedReductionNode` and adds a virtual function > `bool requires_strict_order()` in `ReductionNode`. Besides > `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` > have a fixed value. > > With this patch, Vector API would always generate non strictly-ordered > `AddReductionVF/D' on SVE machines with vector length <= 16B as it is > more beneficial to generate non-strictly ordered instructions on such > machines compared to strictly ordered ones. > > [AArch64] > On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. > Auto-vectorization has already banned these nodes in JDK-8275275 [2]. > > This patch adds matching rules for non strictly-ordered > `AddReductionVF/D`. > > No effects on other platforms. > > [Performance] > FloatMaxVector.ADDLanes [3] measures the performance of add reduction > for floating-point type. With this patch, it improves ~3x on my SVE > machine (128-bit). > > ADDLanes > Benchmark Before After Unit > FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms > > Final code is as below: > > ``` > Before: > fadda z17.s, p7/m, z17.s, z16.s > > After: > faddp v17.4s, v21.4s,... I have no further objections, but please wait for a C2 specialist to review this. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18034#pullrequestreview-2045384661 From rcastanedalo at openjdk.org Wed May 8 11:59:52 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 8 May 2024 11:59:52 GMT Subject: RFR: 8330584: IGV: XML does not save all node properties In-Reply-To: References: Message-ID: <_jl-HL9mMXMA4tjKEuA8qdsHOjkng2QQ541aeHjmmT8=.5c7348ae-cdde-4b12-94d1-cf2bb181d862@github.com> On Mon, 6 May 2024 12:06:20 GMT, Tobias Holenstein wrote: > When C2 sends graphs over the network to IGV, each graph is sent separately. The same applies if C2 saves graphs to XML: each graph is saved with all it's nodes as a separate `...` in the XML > > To save space, graphs that are saved from IGV only contains the incremental difference for each graph. This saves a lot of space (~5-10x). The logic happens in Printer.java -> `exportInputGraph(.., difference=true, ...)` Unfortunately, there is a bug in this logic: the properties of the nodes are not saved correctly. > > [graphs.zip](https://github.com/openjdk/jdk/files/15220940/graphs.zip) contains 4 graphs: > > `graph_c2.xml` (230KB) - a XML saved from C2 > `graph_igv_bug.xml` (73KB) - opened `graph_c2.xml` in IGV (without this fix) and save as `graph_igv_bug.xml`. > `graph_igv_fixed.xml` (123KB) - opened `graph_c2.xml` in IGV (with this fix) and save as `graph_igv_fixed.xml `. > > As you can see `graph_igv_fixed.xml` is twice as large as `graph_igv_bug.xml` because it contains the missing properties. But now the memory saving from the original `graph_c2.xml` is only ~2x. > Therefore a new format for saving is added: graphs can now be saved and opened from IGV as `.igv`. This uses a compressed (ZIP) format. > > `graph.igv` (10KB) is the same graph as `graph_c2.xml` (230KB). But it uses difference graph compression and ZIP compression and is in total 23x smaller in memory footprint. > > > > E.g. The root in the last graph of difference_true.xml has way less properties than in difference_false.xml. Good catch and nice feature! src/utils/IdealGraphVisualizer/Coordinator/src/main/java/com/sun/hotspot/igv/coordinator/OutlineTopComponent.java line 578: > 576: SwingUtilities.invokeLater(() -> { > 577: for (Node child : manager.getRootContext().getChildren().getNodes(true)) { > 578: // Nodes a lazily created. By expanding and collapsing they are all initialized Suggestion: // Nodes are lazily created. By expanding and collapsing they are all initialized ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19104#pullrequestreview-2045436896 PR Review Comment: https://git.openjdk.org/jdk/pull/19104#discussion_r1593900816 From tholenstein at openjdk.org Wed May 8 12:10:23 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 8 May 2024 12:10:23 GMT Subject: RFR: 8330584: IGV: XML does not save all node properties [v2] In-Reply-To: References: Message-ID: > When C2 sends graphs over the network to IGV, each graph is sent separately. The same applies if C2 saves graphs to XML: each graph is saved with all it's nodes as a separate `...` in the XML > > To save space, graphs that are saved from IGV only contains the incremental difference for each graph. This saves a lot of space (~5-10x). The logic happens in Printer.java -> `exportInputGraph(.., difference=true, ...)` Unfortunately, there is a bug in this logic: the properties of the nodes are not saved correctly. > > [graphs.zip](https://github.com/openjdk/jdk/files/15220940/graphs.zip) contains 4 graphs: > > `graph_c2.xml` (230KB) - a XML saved from C2 > `graph_igv_bug.xml` (73KB) - opened `graph_c2.xml` in IGV (without this fix) and save as `graph_igv_bug.xml`. > `graph_igv_fixed.xml` (123KB) - opened `graph_c2.xml` in IGV (with this fix) and save as `graph_igv_fixed.xml `. > > As you can see `graph_igv_fixed.xml` is twice as large as `graph_igv_bug.xml` because it contains the missing properties. But now the memory saving from the original `graph_c2.xml` is only ~2x. > Therefore a new format for saving is added: graphs can now be saved and opened from IGV as `.igv`. This uses a compressed (ZIP) format. > > `graph.igv` (10KB) is the same graph as `graph_c2.xml` (230KB). But it uses difference graph compression and ZIP compression and is in total 23x smaller in memory footprint. > > > > E.g. The root in the last graph of difference_true.xml has way less properties than in difference_false.xml. Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: Update src/utils/IdealGraphVisualizer/Coordinator/src/main/java/com/sun/hotspot/igv/coordinator/OutlineTopComponent.java Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19104/files - new: https://git.openjdk.org/jdk/pull/19104/files/eabd53cd..632b4baa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19104&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19104&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19104.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19104/head:pull/19104 PR: https://git.openjdk.org/jdk/pull/19104 From epeter at openjdk.org Wed May 8 12:11:02 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 May 2024 12:11:02 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v8] In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 12:52:15 GMT, Bhavana Kilambi wrote: >> Floating-point addition is non-associative, that is adding floating-point elements in arbitrary order may get different value. Specially, Vector API does not define the order of reduction intentionally, which allows platforms to generate more efficient codes [1]. So that needs a node to represent non strictly-ordered add-reduction for floating-point type in C2. >> >> To avoid introducing new nodes, this patch adds a bool field in `AddReductionVF/D` to distinguish whether they require strict order. It also removes `UnorderedReductionNode` and adds a virtual function `bool requires_strict_order()` in `ReductionNode`. Besides `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` have a fixed value. >> >> With this patch, Vector API would always generate non strictly-ordered `AddReductionVF/D' on SVE machines with vector length <= 16B as it is more beneficial to generate non-strictly ordered instructions on such machines compared to strictly ordered ones. >> >> [AArch64] >> On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. Auto-vectorization has already banned these nodes in JDK-8275275 [2]. >> >> This patch adds matching rules for non strictly-ordered `AddReductionVF/D`. >> >> No effects on other platforms. >> >> [Performance] >> FloatMaxVector.ADDLanes [3] measures the performance of add reduction for floating-point type. With this patch, it improves ~3x on my SVE machine (128-bit). >> >> ADDLanes >> >> Benchmark Before After Unit >> FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms >> >> >> Final code is as below: >> >> Before: >> ` fadda z17.s, p7/m, z17.s, z16.s >> ` >> After: >> >> faddp v17.4s, v21.4s, v21.4s >> faddp s18, v17.2s >> fadd s18, s18, s19 >> >> >> >> >> [Test] >> Full jtreg passed on AArch64 and x86. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2529 >> [2] https://bugs.openjdk.org/browse/JDK-8275275 >> [3] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L316 > > Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge master > - Adjust format for the backend rules changed in previous commit > - Address some more review comments > - Revert to previous indentation > - Add comments, revert to requires_strict_order and other minor changes > - Naming changes: replace strict/non-strict with more technical terms > - Addressed review comments for changes in backend rules and code style > - 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction > > Floating-point addition is non-associative, that is adding > floating-point elements in arbitrary order may get different value. > Specially, Vector API does not define the order of reduction > intentionally, which allows platforms to generate more efficient codes > [1]. So that needs a node to represent non strictly-ordered > add-reduction for floating-point type in C2. > > To avoid introducing new nodes, this patch adds a bool field in > `AddReductionVF/D` to distinguish whether they require strict order. It > also removes `UnorderedReductionNode` and adds a virtual function > `bool requires_strict_order()` in `ReductionNode`. Besides > `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` > have a fixed value. > > With this patch, Vector API would always generate non strictly-ordered > `AddReductionVF/D' on SVE machines with vector length <= 16B as it is > more beneficial to generate non-strictly ordered instructions on such > machines compared to strictly ordered ones. > > [AArch64] > On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. > Auto-vectorization has already banned these nodes in JDK-8275275 [2]. > > This patch adds matching rules for non strictly-ordered > `AddReductionVF/D`. > > No effects on other platforms. > > [Performance] > FloatMaxVector.ADDLanes [3] measures the performance of add reduction > for floating-point type. With this patch, it improves ~3x on my SVE > machine (128-bit). > > ADDLanes > Benchmark Before After Unit > FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms > > Final code is as below: > > ``` > Before: > fadda z17.s, p7/m, z17.s, z16.s > > After: > faddp v17.4s, v21.4s,... I'll look at it again, once my concerns are all addressed. @Bhavana-Kilambi feel free to ping me again for that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18034#issuecomment-2100431939 From dfenacci at openjdk.org Wed May 8 13:47:08 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 8 May 2024 13:47:08 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v9] In-Reply-To: References: Message-ID: > # Issue > When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. > > # Causes > On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. > The same is true for `StoreVector`s. > When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 > > where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor offsets interfere. > Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > but we don?t make sure that there are no masks or offsets. > A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and offsets here too but in this case we can include these cases if the masks and offsets of the vector stores are equivalent. > > # Solution > To avoid folding `Load`- and `StoreVector`s with masks and offsets we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns its own (to avoid ever being the same as a store node). In this way, the checks in `MemNode::can_see_stored_value` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 > > and `StoreNode::Identity` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > will fail if masks or offsets are used. > For 2 stores of the same value we instead check for mask and offset equality. > > Regression tests for all versions of `Load/StoreVectorGather/Masked` have been add... Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8325520: simplify check for offsets and masks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18347/files - new: https://git.openjdk.org/jdk/pull/18347/files/a2cb6a58..9b742109 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=07-08 Stats: 38 lines in 2 files changed: 6 ins; 21 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/18347.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18347/head:pull/18347 PR: https://git.openjdk.org/jdk/pull/18347 From aph at openjdk.org Wed May 8 14:26:04 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 8 May 2024 14:26:04 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v18] In-Reply-To: <-Q8XJ3BT26WE6vPUNR7-_Wi7iw7QKTi9O5HsvdeGh4M=.e35dc82b-326f-4207-a3f3-bacfb20032f4@github.com> References: <-Q8XJ3BT26WE6vPUNR7-_Wi7iw7QKTi9O5HsvdeGh4M=.e35dc82b-326f-4207-a3f3-bacfb20032f4@github.com> Message-ID: <54Lj3Z2JBIzBXLKm579qiAzQQXnNN3BrTPXBNXpCC7A=.2f3b353a-b97c-43f4-af95-de55c72e3fb7@github.com> On Tue, 7 May 2024 16:53:21 GMT, Vladimir Kozlov wrote: > I want to see performance numbers on x64 and aarch64 before starting looking on it. It would be nice to have data for all micros `test/micro/org/openjdk/bench/java/lang/ScopedValues*.java` > > Put results into JBS and post short summary here. > > You can compare by disable/enable new intrinsics. I'm on it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2100706963 From sgibbons at openjdk.org Wed May 8 14:30:54 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 8 May 2024 14:30:54 GMT Subject: RFR: 8331033: EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 [v3] In-Reply-To: References: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> Message-ID: On Tue, 7 May 2024 21:17:23 GMT, Scott Gibbons wrote: >> Added a strcmp for unsafe_setmemory in process_call_arguments() so the assert would not trigger. >> >> I believe this is the correct fix as I do not think the arguments for setMemory need special handling like arraycopy. >> >> I would like suggestions on how to generate a testcase to catch this type of error in mainline. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Review comments - change copyright, add @requires, change @run Awesome! Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19032#issuecomment-2100720273 From kvn at openjdk.org Wed May 8 14:30:54 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 8 May 2024 14:30:54 GMT Subject: RFR: 8331033: EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 [v3] In-Reply-To: References: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> Message-ID: On Tue, 7 May 2024 21:17:23 GMT, Scott Gibbons wrote: >> Added a strcmp for unsafe_setmemory in process_call_arguments() so the assert would not trigger. >> >> I believe this is the correct fix as I do not think the arguments for setMemory need special handling like arraycopy. >> >> I would like suggestions on how to generate a testcase to catch this type of error in mainline. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Review comments - change copyright, add @requires, change @run My testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19032#pullrequestreview-2045845632 From kvn at openjdk.org Wed May 8 14:35:52 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 8 May 2024 14:35:52 GMT Subject: RFR: 8331764: C2 SuperWord: refactor _align_to_ref/_mem_ref_for_main_loop_alignment In-Reply-To: References: Message-ID: On Tue, 7 May 2024 09:26:11 GMT, Emanuel Peter wrote: > This PR accomplishes these things: > - Rename `_align_to_ref` -> `_mem_ref_for_main_loop_alignment`. > - Move the `mem_ref` finding for alignment out of `SuperWord::find_adjacent_refs`. This is too early, and we don't even know if the relevant `mem_ref` is going to be vectorized. It makes more sense to pick a `mem_ref` directly in `SuperWord::adjust_pre_loop_limit_to_align_main_loop_vectors`, where we already know what packs are going to be vectorized. > - For the alignment width (aw), we can use the `vector_width` of the pack to which the `mem_ref` belongs, rather than the potentially much larger `vector_width_in_bytes`. I track this with `_aw_for_main_loop_alignment` now. > > I need this for https://github.com/openjdk/jdk/pull/18822, and decided to split it out into an independent change. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19115#pullrequestreview-2045861945 From jbhateja at openjdk.org Wed May 8 16:43:04 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 8 May 2024 16:43:04 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v15] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 16:58:09 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > parameter and local renames, update comment src/hotspot/cpu/x86/assembler_x86.cpp line 1971: > 1969: void Assembler::crc32(Register crc, Register v, int8_t sizeInBytes) { > 1970: assert(VM_Version::supports_sse4_2(), ""); > 1971: if (needs_rex2(crc, v)) { This being a map2 instruction should check for needs eevex, rex2 nomenclature looks misleading here. src/hotspot/cpu/x86/assembler_x86.cpp line 11902: > 11900: vex_x = (src_enc >= 16) && !src_is_gpr; > 11901: attributes->set_is_evex_instruction(); > 11902: evex_prefix(vex_r, vex_b, vex_x, evex_r, evex_b, evex_v, false /*eevex_x*/, nds_enc, pre, opc); Hi @steveatgh , UseAVX is set to level 3 only when target support AVX512F feature, entire encoding support for EVEX encoding is guarded by UseAVX > 2. Legacy map 2 and 3 instruction using EGPR register mandates Extended EVEX encoding and user may explicitly set UseAVX to level 2. What are your thoughts on extending the guarding check with UseAPX ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1593759162 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1593785935 From kvn at openjdk.org Wed May 8 16:48:02 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 8 May 2024 16:48:02 GMT Subject: RFR: 8331862: Remove split relocation info implementation [v2] In-Reply-To: References: Message-ID: > [Split relocation info](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/relocInfo.hpp#L225) was used only for SPARC. Non of current OpenJDK platforms use it. > > Tested tier1-3,stress,xcomp. Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: clean up comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19126/files - new: https://git.openjdk.org/jdk/pull/19126/files/a9fc1df8..64c9e66b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19126&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19126&range=00-01 Stats: 13 lines in 1 file changed: 0 ins; 6 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/19126.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19126/head:pull/19126 PR: https://git.openjdk.org/jdk/pull/19126 From dlong at openjdk.org Wed May 8 18:57:54 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 8 May 2024 18:57:54 GMT Subject: RFR: 8331862: Remove split relocation info implementation [v2] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 16:48:02 GMT, Vladimir Kozlov wrote: >> [Split relocation info](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/relocInfo.hpp#L225) was used only for SPARC. Non of current OpenJDK platforms use it. >> >> Tested tier1-3,stress,xcomp. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > clean up comments src/hotspot/cpu/s390/assembler_s390.cpp line 2: > 1: /* > 2: * Copyright (c) 2016, 2021, Oracle and/or its affiliates. All rights reserved. No changes to this file? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19126#discussion_r1594499001 From dlong at openjdk.org Wed May 8 19:14:56 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 8 May 2024 19:14:56 GMT Subject: RFR: 8331862: Remove split relocation info implementation [v2] In-Reply-To: References: Message-ID: <8JPE8NAiPPoaDoHnGHp-tiaaHSa9K7XIXLFkZDXFlEw=.99bbd30e-79e3-4ed7-baf2-4b8460f09415@github.com> On Wed, 8 May 2024 16:48:02 GMT, Vladimir Kozlov wrote: >> [Split relocation info](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/relocInfo.hpp#L225) was used only for SPARC. Non of current OpenJDK platforms use it. >> >> Tested tier1-3,stress,xcomp. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > clean up comments src/hotspot/share/code/relocInfo.hpp line 133: > 131: // Data: [] an oop stored in 4 bytes of instruction > 132: // [n] n is the index of an oop in the CodeBlob's oop pool > 133: // [Nn] index may be 32 bits if necessary Lines 132 and 133 could be combined into something like: // [[N]n] index of an oop in the CodeBlob's oop pool which seems consistent with other descriptions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19126#discussion_r1594515706 From dlong at openjdk.org Wed May 8 19:19:52 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 8 May 2024 19:19:52 GMT Subject: RFR: 8331862: Remove split relocation info implementation [v2] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 16:48:02 GMT, Vladimir Kozlov wrote: >> [Split relocation info](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/relocInfo.hpp#L225) was used only for SPARC. Non of current OpenJDK platforms use it. >> >> Tested tier1-3,stress,xcomp. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > clean up comments Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19126#pullrequestreview-2046464449 From kvn at openjdk.org Wed May 8 19:34:10 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 8 May 2024 19:34:10 GMT Subject: RFR: 8331862: Remove split relocation info implementation [v3] In-Reply-To: References: Message-ID: > [Split relocation info](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/relocInfo.hpp#L225) was used only for SPARC. Non of current OpenJDK platforms use it. > > Tested tier1-3,stress,xcomp. Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: address comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19126/files - new: https://git.openjdk.org/jdk/pull/19126/files/64c9e66b..0e3ac42b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19126&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19126&range=01-02 Stats: 3 lines in 2 files changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19126.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19126/head:pull/19126 PR: https://git.openjdk.org/jdk/pull/19126 From kvn at openjdk.org Wed May 8 19:34:10 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 8 May 2024 19:34:10 GMT Subject: RFR: 8331862: Remove split relocation info implementation [v2] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 16:48:02 GMT, Vladimir Kozlov wrote: >> [Split relocation info](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/relocInfo.hpp#L225) was used only for SPARC. Non of current OpenJDK platforms use it. >> >> Tested tier1-3,stress,xcomp. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > clean up comments Thank you, @dean-long, for review. I addressed all your comments. ------------- PR Review: https://git.openjdk.org/jdk/pull/19126#pullrequestreview-2046480668 From kvn at openjdk.org Wed May 8 19:34:10 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 8 May 2024 19:34:10 GMT Subject: RFR: 8331862: Remove split relocation info implementation [v3] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 18:55:02 GMT, Dean Long wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> address comments > > src/hotspot/cpu/s390/assembler_s390.cpp line 2: > >> 1: /* >> 2: * Copyright (c) 2016, 2024, Oracle and/or its affiliates. All rights reserved. > > No changes to this file? Accidental change. Reverted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19126#discussion_r1594530168 From kvn at openjdk.org Wed May 8 19:34:10 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 8 May 2024 19:34:10 GMT Subject: RFR: 8331862: Remove split relocation info implementation [v2] In-Reply-To: <8JPE8NAiPPoaDoHnGHp-tiaaHSa9K7XIXLFkZDXFlEw=.99bbd30e-79e3-4ed7-baf2-4b8460f09415@github.com> References: <8JPE8NAiPPoaDoHnGHp-tiaaHSa9K7XIXLFkZDXFlEw=.99bbd30e-79e3-4ed7-baf2-4b8460f09415@github.com> Message-ID: On Wed, 8 May 2024 19:12:16 GMT, Dean Long wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> clean up comments > > src/hotspot/share/code/relocInfo.hpp line 133: > >> 131: // Data: [] an oop stored in 4 bytes of instruction >> 132: // [n] n is the index of an oop in the CodeBlob's oop pool >> 133: // [Nn] index may be 32 bits if necessary > > Lines 132 and 133 could be combined into something like: > > // [[N]n] index of an oop in the CodeBlob's oop pool > > which seems consistent with other descriptions. Okay. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19126#discussion_r1594530833 From kvn at openjdk.org Wed May 8 19:55:54 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 8 May 2024 19:55:54 GMT Subject: RFR: 8331862: Remove split relocation info implementation [v3] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 19:34:10 GMT, Vladimir Kozlov wrote: >> [Split relocation info](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/relocInfo.hpp#L225) was used only for SPARC. Non of current OpenJDK platforms use it. >> >> Tested tier1-3,stress,xcomp. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > address comments @TheRealMDoerr, @RealFYang, @offamitkumar, @bulasevich I need your help with testing this on your platforms, at least tier1. GHA does some cross compilation but not testing. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19126#issuecomment-2101318735 From bkilambi at openjdk.org Wed May 8 20:26:04 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 8 May 2024 20:26:04 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v8] In-Reply-To: References: Message-ID: <8-_t7nWbR9gZ2_QkfFNuf5M0Q4PMkKJKgwS3ZbHcCxI=.32dc4f11-dec5-468d-afc8-3b4dae285dcb@github.com> On Tue, 7 May 2024 13:20:37 GMT, Emanuel Peter wrote: > I just realized that there is no regression test. And I think it would be nice to have one. > > Also, we should add some sort of message to the `dump` if the `ReductionNode` has the `requires_strict_order` on or off. I think that could be done in `dump_spec`. > > You could do it similar to: > > ``` > #ifndef PRODUCT > void VectorMaskCmpNode::dump_spec(outputStream *st) const { > st->print(" %d #", _predicate); _type->dump_on(st); > } > #endif // PRODUCT > ``` > > This would actually allow you to create a IR test! > > You would check that the AddReductionVNode is annotated correctly. You need some VectorAPI tests, and some SuperWord auto-vectorization tests. > > How does that sound? That would ensure that nobody can easily destroy your RFE, at least not in the IR. Hi @eme64 , thanks for the suggestion. I can add the `dump_spec` as suggested (which would print if the `_requires_strict_order` flag is enabled/disabled) but I am not sure if I fully understand what's expected in the JTREG tests. Should I be verifying the `-XX:+PrintIdeal` output to make sure the correct message is being printed for the `ReductionV*` nodes? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18034#issuecomment-2101362464 From duke at openjdk.org Wed May 8 20:30:00 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Wed, 8 May 2024 20:30:00 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v15] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 10:12:29 GMT, Jatin Bhateja wrote: >> Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: >> >> parameter and local renames, update comment > > src/hotspot/cpu/x86/assembler_x86.cpp line 11902: > >> 11900: vex_x = (src_enc >= 16) && !src_is_gpr; >> 11901: attributes->set_is_evex_instruction(); >> 11902: evex_prefix(vex_r, vex_b, vex_x, evex_r, evex_b, evex_v, false /*eevex_x*/, nds_enc, pre, opc); > > Hi @steveatgh , UseAVX is set to level 3 only when target support AVX512F feature, entire encoding support for EVEX encoding is guarded by UseAVX > 2. Legacy map 2 and 3 instruction using EGPR register mandates Extended EVEX encoding and user may explicitly set UseAVX to level 2. > What are your thoughts on extending the guarding check with UseAPX ? Thanks @jatin-bhateja . Do you mean a check such as: `if ((UseAVX > 2 || UseAPX) && !attributes->is_legacy_mode())` ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1594587834 From duke at openjdk.org Wed May 8 23:40:20 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Wed, 8 May 2024 23:40:20 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v16] In-Reply-To: References: Message-ID: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: add ::needs_eevex for use with promoted map2 instructions (e.g. crc32) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/2a63a159..52628798 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=14-15 Stats: 8 lines in 2 files changed: 6 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From duke at openjdk.org Wed May 8 23:40:21 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Wed, 8 May 2024 23:40:21 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v15] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 09:52:04 GMT, Jatin Bhateja wrote: >> Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: >> >> parameter and local renames, update comment > > src/hotspot/cpu/x86/assembler_x86.cpp line 1971: > >> 1969: void Assembler::crc32(Register crc, Register v, int8_t sizeInBytes) { >> 1970: assert(VM_Version::supports_sse4_2(), ""); >> 1971: if (needs_rex2(crc, v)) { > > This being a map2 instruction should check for needs eevex, rex2 nomenclature looks misleading here. Thanks for the comment. Although crc32 is the only promoted map2 instruction (currently) implemented in the assembler, additional map2 instructions may be added later. I added ::needs_eevex and used as you suggest in the crc32 instr. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1594808629 From cslucas at openjdk.org Wed May 8 23:49:19 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 8 May 2024 23:49:19 GMT Subject: RFR: JDK-8330795 : C2: assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type with -XX:-UseCompressedClassPointers Message-ID: The `assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type` failure was caused by the fact that we didn't have a "zero value" for the type T_METADATA. The RAM patch uses that data when it creates a Phi node merging Klass loads and UseCompressedClassPointers is disabled. Tested with JTREG tier1-4 on Linux x86_64 & ARM64. ------------- Commit messages: - Add null and zero types. Changes: https://git.openjdk.org/jdk/pull/19148/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19148&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330795 Stats: 65 lines in 2 files changed: 65 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19148.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19148/head:pull/19148 PR: https://git.openjdk.org/jdk/pull/19148 From cslucas at openjdk.org Wed May 8 23:50:22 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 8 May 2024 23:50:22 GMT Subject: RFR: JDK-8330565 : C2: Multiple crashes with CTW after JDK-8316991 Message-ID: The `# assert(false) failed: Bad graph detected in build_loop_late` failure was caused because a string concatenation optimization using [this method](https://github.com/openjdk/jdk/blob/819f3d6fc70ff6fe54ac5f9033c17c3dd4326aa5/src/hotspot/share/opto/graphKit.cpp#L4115) adds AddP and LoadN nodes to IR graph as NotNull _and_ because RAM was not "nullyfing" phis merging nullable pointers. I was only able to reproduce this problem using a classfile/jar compiled using an "old" version of JDK.. because newer version use InvokeDynamic to do string concatenation. Tested with JTREG tier1-4 on Linux x86_64 & ARM64. ------------- Commit messages: - Make phi merging pointer loads nullable & add test. Changes: https://git.openjdk.org/jdk/pull/19147/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19147&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330565 Stats: 83 lines in 2 files changed: 83 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19147.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19147/head:pull/19147 PR: https://git.openjdk.org/jdk/pull/19147 From kvn at openjdk.org Thu May 9 01:06:00 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 9 May 2024 01:06:00 GMT Subject: RFR: JDK-8330795 : C2: assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type with -XX:-UseCompressedClassPointers In-Reply-To: References: Message-ID: On Wed, 8 May 2024 23:44:26 GMT, Cesar Soares Lucas wrote: > The `assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type` failure was caused by the fact that we didn't have a "zero value" for the type T_METADATA. The RAM patch uses that data when it creates a Phi node merging Klass loads and UseCompressedClassPointers is disabled. > > Tested with JTREG tier1-4 on Linux x86_64 & ARM64. New test failed in GHA with 32-bit VM because: Unrecognized VM option 'UseCompressedClassPointers' You can add `-XX:+IgnoreUnrecognizedVMOptions` to run test on all platforms. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19148#issuecomment-2101739736 From kvn at openjdk.org Thu May 9 01:28:51 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 9 May 2024 01:28:51 GMT Subject: RFR: JDK-8330565 : C2: Multiple crashes with CTW after JDK-8316991 In-Reply-To: References: Message-ID: On Wed, 8 May 2024 23:44:23 GMT, Cesar Soares Lucas wrote: > The `# assert(false) failed: Bad graph detected in build_loop_late` failure was caused because a string concatenation optimization using [this method](https://github.com/openjdk/jdk/blob/819f3d6fc70ff6fe54ac5f9033c17c3dd4326aa5/src/hotspot/share/opto/graphKit.cpp#L4115) adds AddP and LoadN nodes to IR graph as NotNull _and_ because RAM was not "nullyfing" phis merging nullable pointers. I was only able to reproduce this problem using a classfile/jar compiled using an "old" version of JDK.. because newer version use InvokeDynamic to do string concatenation. > > Tested with JTREG tier1-4 on Linux x86_64 & ARM64. src/hotspot/share/opto/escape.cpp line 779: > 777: _igvn->set_type(data_phi, new_t); > 778: data_phi->raise_bottom_type(new_t); > 779: } Do you intentionally execute `_igvn->transform(` for `data_phi` before you set inputs and now type? Usually we do transform after we fully construct node. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19147#discussion_r1594859343 From kvn at openjdk.org Thu May 9 01:40:51 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 9 May 2024 01:40:51 GMT Subject: RFR: JDK-8330795 : C2: assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type with -XX:-UseCompressedClassPointers In-Reply-To: References: Message-ID: On Wed, 8 May 2024 23:44:26 GMT, Cesar Soares Lucas wrote: > The `assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type` failure was caused by the fact that we didn't have a "zero value" for the type T_METADATA. The RAM patch uses that data when it creates a Phi node merging Klass loads and UseCompressedClassPointers is disabled. > > Tested with JTREG tier1-4 on Linux x86_64 & ARM64. @JohnTortugo, thank you for adding new test. But it would be nice also add additional run with `-XX:+IgnoreUnrecognizedVMOptions -XX:-UseCompressedClassPointers` to failed test `test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java` Also why you require to run test only with compressed oops on?: * @requires vm.debug == true & vm.bits == 64 & vm.compiler2.enabled & vm.opt.final.UseCompressedOops & vm.opt.final.EliminateAllocations ------------- PR Comment: https://git.openjdk.org/jdk/pull/19148#issuecomment-2101773136 From kvn at openjdk.org Thu May 9 01:48:51 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 9 May 2024 01:48:51 GMT Subject: RFR: JDK-8330795 : C2: assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type with -XX:-UseCompressedClassPointers In-Reply-To: References: Message-ID: On Thu, 9 May 2024 01:38:44 GMT, Vladimir Kozlov wrote: > @JohnTortugo, thank you for adding new test. But it would be nice also add additional run with `-XX:+IgnoreUnrecognizedVMOptions -XX:-UseCompressedClassPointers` to failed test `test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java` Actually `-XX:+IgnoreUnrecognizedVMOptions` is not needed because you require `vm.bits == 64` in the test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19148#issuecomment-2101781070 From cslucas at openjdk.org Thu May 9 03:09:11 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 9 May 2024 03:09:11 GMT Subject: RFR: JDK-8330795 : C2: assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type with -XX:-UseCompressedClassPointers [v2] In-Reply-To: References: Message-ID: > The `assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type` failure was caused by the fact that we didn't have a "zero value" for the type T_METADATA. The RAM patch uses that data when it creates a Phi node merging Klass loads and UseCompressedClassPointers is disabled. > > Tested with JTREG tier1-4 on Linux x86_64 & ARM64. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Require vm.bits == 64 on new test. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19148/files - new: https://git.openjdk.org/jdk/pull/19148/files/ea64c880..91fc61de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19148&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19148&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19148.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19148/head:pull/19148 PR: https://git.openjdk.org/jdk/pull/19148 From cslucas at openjdk.org Thu May 9 03:11:53 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 9 May 2024 03:11:53 GMT Subject: RFR: JDK-8330795 : C2: assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type with -XX:-UseCompressedClassPointers In-Reply-To: References: Message-ID: On Thu, 9 May 2024 01:46:45 GMT, Vladimir Kozlov wrote: > @JohnTortugo, thank you for adding new test. But it would be nice also add additional run with -XX:+IgnoreUnrecognizedVMOptions -XX:-UseCompressedClassPointers to failed test test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java Thank you @vnkozlov , I'll work on that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19148#issuecomment-2101853175 From duke at openjdk.org Thu May 9 03:35:58 2024 From: duke at openjdk.org (duke) Date: Thu, 9 May 2024 03:35:58 GMT Subject: Withdrawn: 8325681: C2 inliner rejects to inline a deeper callee because the methoddata of caller is immature. In-Reply-To: References: Message-ID: <2-DiL7OaUdt4ncWDCNxGK2DJerNN4mqmJDPSEEvIFBQ=.0b5704fc-aa96-491c-80b1-734b01b3863a@github.com> On Thu, 22 Feb 2024 05:37:26 GMT, Xin Liu wrote: > This patch uses the methoddata of a method no matter it is mature or not to initialize `ciCallProfile`. Previously, C2 drops premature methoddata and leaves _count field of ciCallProfile -1. This leads C2 refuses to inline the callsite because its frequency is too low(-1 < MinInlineFrequencyRatio). > > In the given example, we observes that baz was not inlined because of 'low call site frequency'. This is wrong because its real frequency is 10% > MinInlineFrequencyRatio. > > > 60 13 b 4 UnderProfiledSubprocedure::foo (9 bytes) > @ 5 UnderProfiledSubprocedure::bar (6 bytes) inline (hot) > @ 1 UnderProfiledSubprocedure::baz (19 bytes) failed to inline: low call site frequency This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/17957 From duke at openjdk.org Thu May 9 03:56:59 2024 From: duke at openjdk.org (duke) Date: Thu, 9 May 2024 03:56:59 GMT Subject: Withdrawn: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 08:10:54 GMT, Quan Anh Mai wrote: > Hi, > > This patch introduces `JitCompiler::isConstantExpression` which can be used to statically determine whether an expression has been constant-folded by the Jit compiler, leading to more constant-folding opportunities. For example, it can be used in `MemorySessionImpl::checkValidStateRaw` to eliminate the lifetime check on global sessions without imposing additional branches on other non-global sessions. This is similar to `__builtin_constant_p` in GCC and clang. > > Please kindly give your opinion as well as your reviews, thanks very much. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/17527 From duke at openjdk.org Thu May 9 04:28:00 2024 From: duke at openjdk.org (duke) Date: Thu, 9 May 2024 04:28:00 GMT Subject: Withdrawn: 8315361: C2 SuperWord: refactor out loop analysis into shared auto-vectorization facility VLoopAnalyzer In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 17:24:22 GMT, Emanuel Peter wrote: > This is a refactoring of `SuperWord`. > > **Goals** > > 1. Clean up `SuperWord`: disentangle different components, make them more **modular**. > 2. Make the loop analysis parts a **shared facility**, not just for SuperWord but also the post-loop-vectorizer ([JDK-8308994](https://bugs.openjdk.org/browse/JDK-8308994)). > 3. It is also a necessary step on my bigger plans for improvement with the C2 Auto-Vectorizer ([see my blog post](https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html)). > 4. Improve tracing in the auto-vectorization by making it more systematic. > > **Summary** > > - I wrote a summary of how C2 auto-vectorization with SuperWord works (please read!): > https://github.com/openjdk/jdk/blob/95fd361e60fc66eb91edad321662e508b2d1bdde/src/hotspot/share/opto/superword.hpp#L32-L177 > - I moved many `Superword` components out to `VLoop` and to `VLoopAnalyzer`. The idea is that any vectorizer can use these facilities in the future. They are therefore made more modular, which should hopefully make future changes easier. These components are: > - Checking the pre-conditions for vectorization (e.g. no unwanted ctrl-flow). > - `VLoop::check_preconditions_helper` replaces code from old `SuperWord::transform_loop`. > - Running all submodules of `VLoopAnalyzer`: `VLoopAnalyzer::analyze_helper`. Replaces analysis part of `SuperWord::SLP_extract`. > - Finding and marking reductions -> `VLoopReductions` > - Detecting memory slices -> `VLoopMemorySlices` > - Analyzing the body -> `VLoopBody` (renamed `in_bb` -> `in_body`) > - Determining vector element types, and functions to determine the `vector_width` of a node -> `VLoopTypes` > - Constructing the dependence graph -> `VLoopDependenceGraph`. Replaces old `DepGraph` with all its components. > - New: CompileCommand option `TraceAutovectorization` > - Run with `-XX:CompileCommand=traceAutovectorization,*::*,help` to get a usage description. > - Replaced all printing with flags `TraceSuperWord` (and `Verbose`) and of `VectorizeDebug`. > - The advantage of a CompileCommand is that tracing can be applied selectively for only a limited set of java classes / methods. > - It uses tags, which are more readable than the `VectorizeDebug` bit-flags. These tags can be used for all parts of the vectorizer, but one can also target SuperWord specifically. > - I systematically added tracing at every point where vectorization (partially) fails (use tag `SW_REJECTIONS`). > - `TraceSuperWord` still works, and performs the sa... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/16620 From mli at openjdk.org Thu May 9 08:46:17 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 9 May 2024 08:46:17 GMT Subject: RFR: 8331577: RISC-V: C2 CountLeadingZerosV Message-ID: Hi, Can you help to review this patch adding CountLeadingZerosV and CountTrailingZerosV instrinsics? Thanks. ------------- Commit messages: - typo - Initial commit Changes: https://git.openjdk.org/jdk/pull/19153/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19153&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331577 Stats: 63 lines in 3 files changed: 62 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19153.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19153/head:pull/19153 PR: https://git.openjdk.org/jdk/pull/19153 From amitkumar at openjdk.org Thu May 9 09:12:54 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 9 May 2024 09:12:54 GMT Subject: RFR: 8331862: Remove split relocation info implementation [v3] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 19:34:10 GMT, Vladimir Kozlov wrote: >> [Split relocation info](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/relocInfo.hpp#L225) was used only for SPARC. Non of current OpenJDK platforms use it. >> >> Tested tier1-3,stress,xcomp. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > address comments Result Looks good on s390x. I ran `tier1` tests on `fastdebug-vm`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19126#issuecomment-2102269058 From mli at openjdk.org Thu May 9 09:48:09 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 9 May 2024 09:48:09 GMT Subject: RFR: 8331577: RISC-V: C2 CountLeadingZerosV [v2] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch adding CountLeadingZerosV and CountTrailingZerosV instrinsics? > Thanks. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: fix masked issue ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19153/files - new: https://git.openjdk.org/jdk/pull/19153/files/9c38914a..1d5d17fe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19153&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19153&range=00-01 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/19153.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19153/head:pull/19153 PR: https://git.openjdk.org/jdk/pull/19153 From mli at openjdk.org Thu May 9 10:28:54 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 9 May 2024 10:28:54 GMT Subject: RFR: 8331577: RISC-V: C2 CountLeadingZerosV [v2] In-Reply-To: References: Message-ID: <8PHpGLNxXgcc-oM9IAc9UnnJNCaF34NnEHFP2R2nSvs=.383399b1-e70e-460c-8efe-9d88ab6a34ba@github.com> On Thu, 9 May 2024 09:48:09 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch adding CountLeadingZerosV and CountTrailingZerosV instrinsics? >> Thanks. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > fix masked issue NOTE: the reason why let dst and src share one register (i.e. `(vReg dst_src, vRegMask_V0 v0)`) in masked version is that for inactive elements, we should keep the origin value, neither `mu` or `ma` will do it. BTW, I will also re-visit all existing masked version instructions to make sure it works as expected. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19153#issuecomment-2102392793 From mli at openjdk.org Thu May 9 11:14:14 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 9 May 2024 11:14:14 GMT Subject: RFR: 8331993: Add counting leading/trailing zero tests for Integer Message-ID: <7a7fXkgF6v-sSFHCk-GT0DbHr9t8AO7bGh1X1JaF-gg=.19a655eb-cb68-446c-8207-270a2ee87492@github.com> Hi, Can you help to review the patch adding some test? Currently, in hotspot/jtreg/compiler/vectorization/TestNumberOfContinuousZeros.java, there is only tests for Long, not for Integer. Thanks. ------------- Commit messages: - Initial commit Changes: https://git.openjdk.org/jdk/pull/19154/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19154&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331993 Stats: 59 lines in 2 files changed: 44 ins; 0 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/19154.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19154/head:pull/19154 PR: https://git.openjdk.org/jdk/pull/19154 From jbhateja at openjdk.org Thu May 9 11:23:56 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 9 May 2024 11:23:56 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v15] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 20:27:29 GMT, Steve Dohrmann wrote: > UseAPX Yes, attaching a test depicting incorrectness with UseAVX=2 for SHLX which is a legacy map 2 instruction promotable to extended EVEX with EGPR operands. [shift_left_APX.txt](https://github.com/openjdk/jdk/files/15261495/shift_left_APX.txt) It will not be appropriate to modify VM_Version::supports_evex for APX feature since its used for constraining dynamic register classes associated with vector operands. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1595320541 From fyang at openjdk.org Thu May 9 11:27:50 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 9 May 2024 11:27:50 GMT Subject: RFR: 8331577: RISC-V: C2 CountLeadingZerosV [v2] In-Reply-To: <8PHpGLNxXgcc-oM9IAc9UnnJNCaF34NnEHFP2R2nSvs=.383399b1-e70e-460c-8efe-9d88ab6a34ba@github.com> References: <8PHpGLNxXgcc-oM9IAc9UnnJNCaF34NnEHFP2R2nSvs=.383399b1-e70e-460c-8efe-9d88ab6a34ba@github.com> Message-ID: <47txZsG98U3vKdhefoQGDYz5g6IPFFWWzQFI9P6pA0A=.1a396733-46e7-4eb5-9c56-d6293196056f@github.com> On Thu, 9 May 2024 10:26:16 GMT, Hamlin Li wrote: > NOTE: the reason why let dst and src share one register (i.e. `(vReg dst_src, vRegMask_V0 v0)`) in masked version is that for inactive elements, we should keep the origin value, neither `mu` or `ma` will do it. Interesting. Is it specified anywhere? > BTW, I will also re-visit all existing masked version instructions to make sure it works as expected. tracked by https://bugs.openjdk.org/browse/JDK-8331992 I think this issue was considered before when we were adding support for vector api. What about the recently added ones like ReverseBytesV, PopCountVI/L? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19153#issuecomment-2102480131 From sgibbons at openjdk.org Thu May 9 12:01:03 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 9 May 2024 12:01:03 GMT Subject: Integrated: 8331033: EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 In-Reply-To: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> References: <1300HHaoZzuiFlEm-fCV7E6E1un4RSQ6Aj2vFyh9uLQ=.c4acc67e-8c78-49cb-ad32-abf1a3ad2106@github.com> Message-ID: On Wed, 1 May 2024 14:01:38 GMT, Scott Gibbons wrote: > Added a strcmp for unsafe_setmemory in process_call_arguments() so the assert would not trigger. > > I believe this is the correct fix as I do not think the arguments for setMemory need special handling like arraycopy. > > I would like suggestions on how to generate a testcase to catch this type of error in mainline. This pull request has now been integrated. Changeset: 0a4eeeaa Author: Scott Gibbons Committer: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/0a4eeeaa3c63585244be959386dd94882398e87f Stats: 108 lines in 2 files changed: 107 ins; 0 del; 1 mod 8331033: EA fails with "EA unexpected CallLeaf unsafe_setmemory" after JDK-8329331 Co-authored-by: Jatin Bhateja Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/19032 From mli at openjdk.org Thu May 9 12:02:56 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 9 May 2024 12:02:56 GMT Subject: RFR: 8331577: RISC-V: C2 CountLeadingZerosV [v2] In-Reply-To: <47txZsG98U3vKdhefoQGDYz5g6IPFFWWzQFI9P6pA0A=.1a396733-46e7-4eb5-9c56-d6293196056f@github.com> References: <8PHpGLNxXgcc-oM9IAc9UnnJNCaF34NnEHFP2R2nSvs=.383399b1-e70e-460c-8efe-9d88ab6a34ba@github.com> <47txZsG98U3vKdhefoQGDYz5g6IPFFWWzQFI9P6pA0A=.1a396733-46e7-4eb5-9c56-d6293196056f@github.com> Message-ID: On Thu, 9 May 2024 11:24:47 GMT, Fei Yang wrote: > > NOTE: the reason why let dst and src share one register (i.e. `(vReg dst_src, vRegMask_V0 v0)`) in masked version is that for inactive elements, we should keep the origin value, neither `mu` or `ma` will do it. > > Interesting. Is it specified anywhere? For the Semantics of `mu` or `ma`, it's https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#343-vector-tail-agnostic-and-vector-mask-agnostic-vta-and-vma. Based on this, we can deduce that here is a hidden bug. > > > BTW, I will also re-visit all existing masked version instructions to make sure it works as expected. tracked by https://bugs.openjdk.org/browse/JDK-8331992 > > I think this issue was considered before when we were adding support for vector api. What about the recently added ones like ReverseBytesV, PopCountVI/L? Yeh, I'm testing with a fix including those 2 intrinsics. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19153#issuecomment-2102527828 From fyang at openjdk.org Thu May 9 12:19:53 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 9 May 2024 12:19:53 GMT Subject: RFR: 8331577: RISC-V: C2 CountLeadingZerosV [v2] In-Reply-To: References: <8PHpGLNxXgcc-oM9IAc9UnnJNCaF34NnEHFP2R2nSvs=.383399b1-e70e-460c-8efe-9d88ab6a34ba@github.com> <47txZsG98U3vKdhefoQGDYz5g6IPFFWWzQFI9P6pA0A=.1a396733-46e7-4eb5-9c56-d6293196056f@github.com> Message-ID: On Thu, 9 May 2024 12:00:22 GMT, Hamlin Li wrote: > > > BTW, I will also re-visit all existing masked version instructions to make sure it works as expected. tracked by https://bugs.openjdk.org/browse/JDK-8331992 Sorry for not being accurate. In fact, I mean requirement at the Java level. Why should we keep the origin value of inactive elements from the input vector? I didn't notice such a requirement. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19153#issuecomment-2102549043 From mli at openjdk.org Thu May 9 12:45:52 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 9 May 2024 12:45:52 GMT Subject: RFR: 8331577: RISC-V: C2 CountLeadingZerosV [v2] In-Reply-To: References: <8PHpGLNxXgcc-oM9IAc9UnnJNCaF34NnEHFP2R2nSvs=.383399b1-e70e-460c-8efe-9d88ab6a34ba@github.com> <47txZsG98U3vKdhefoQGDYz5g6IPFFWWzQFI9P6pA0A=.1a396733-46e7-4eb5-9c56-d6293196056f@github.com> Message-ID: On Thu, 9 May 2024 12:14:28 GMT, Fei Yang wrote: > > > > BTW, I will also re-visit all existing masked version instructions to make sure it works as expected. tracked by https://bugs.openjdk.org/browse/JDK-8331992 > > Sorry for not being accurate. In fact, I mean requirement at the Java level. Why should we keep the origin value of inactive elements from the input vector? I didn't notice such a requirement before. I'm not sure about other places, but in vector APi, if you do operations with a mask, then the untouched (inactive in riscv vector insts) elements should be unmodified, i.e. same as original values. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19153#issuecomment-2102590188 From fyang at openjdk.org Thu May 9 13:20:51 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 9 May 2024 13:20:51 GMT Subject: RFR: 8331577: RISC-V: C2 CountLeadingZerosV [v2] In-Reply-To: References: <8PHpGLNxXgcc-oM9IAc9UnnJNCaF34NnEHFP2R2nSvs=.383399b1-e70e-460c-8efe-9d88ab6a34ba@github.com> <47txZsG98U3vKdhefoQGDYz5g6IPFFWWzQFI9P6pA0A=.1a396733-46e7-4eb5-9c56-d6293196056f@github.com> Message-ID: On Thu, 9 May 2024 12:43:21 GMT, Hamlin Li wrote: > > Sorry for not being accurate. In fact, I mean requirement at the Java level. Why should we keep the origin value of inactive elements from the input vector? I didn't notice such a requirement before. > > I'm not sure about other places, but in vector APi, if you do operations with a mask, then the untouched (inactive in riscv vector insts) elements should be unmodified, i.e. same as original values. It will be helpful if you could point to the specific code or examples. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19153#issuecomment-2102645274 From imyers at openjdk.org Thu May 9 13:21:55 2024 From: imyers at openjdk.org (Ian Myers) Date: Thu, 9 May 2024 13:21:55 GMT Subject: RFR: 8324756: Test vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize is too slow due to dependency verification [v2] In-Reply-To: <-ig7Zj830qvQ91e_kbIRRfOn_8Pm23qxFOxUdGsSSWk=.9a40c696-9c91-4729-916d-61965099e0ae@github.com> References: <-ig7Zj830qvQ91e_kbIRRfOn_8Pm23qxFOxUdGsSSWk=.9a40c696-9c91-4729-916d-61965099e0ae@github.com> Message-ID: On Thu, 2 May 2024 12:57:20 GMT, Aleksey Shipilev wrote: >> Ian Myers has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> [8324756] Remove dependency verification from vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java > > I think you want to add the reversal of https://github.com/openjdk/jdk/commit/2564f0f99866c33d14947609c276a421ce8cc0a2 to this PR as well. > > I am not sure we want to run the test with disabled dependency verification, though. It is a compiler test, so we would like to have compiler checking code online as much as possible. Have you explored if this is an issue with Sweeper removal, and if so, if adding GCs help? @shipilev I have experimented with adding a periodic GC (every 5 seconds) in a new thread, and it did not affect the run time of the test. It still timed out at `CONF=linux-x86_64-server-fastdebug make test 1371.53s user 14.98s system 112% cpu 20:31.41 total` with the removal of the `-XX:-VerifyDependencies` flag. I will submit an amended commit with this test removed from the ProblemList.txt. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19040#issuecomment-2102649525 From fyang at openjdk.org Thu May 9 13:24:54 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 9 May 2024 13:24:54 GMT Subject: RFR: 8331862: Remove split relocation info implementation [v3] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 19:34:10 GMT, Vladimir Kozlov wrote: >> [Split relocation info](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/relocInfo.hpp#L225) was used only for SPARC. Non of current OpenJDK platforms use it. >> >> Tested tier1-3,stress,xcomp. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > address comments Also performed `tier1` tests on linux-riscv64 platform. Result looks good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19126#issuecomment-2102653877 From mli at openjdk.org Thu May 9 14:09:52 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 9 May 2024 14:09:52 GMT Subject: RFR: 8331577: RISC-V: C2 CountLeadingZerosV [v2] In-Reply-To: References: <8PHpGLNxXgcc-oM9IAc9UnnJNCaF34NnEHFP2R2nSvs=.383399b1-e70e-460c-8efe-9d88ab6a34ba@github.com> <47txZsG98U3vKdhefoQGDYz5g6IPFFWWzQFI9P6pA0A=.1a396733-46e7-4eb5-9c56-d6293196056f@github.com> Message-ID: <8FNJwg59AJZc59jms3X0vBA2LG4d6oEexzqJUq7cT1A=.4bbd1d52-7cd5-437b-9e25-77e1d0e245c3@github.com> On Thu, 9 May 2024 13:18:13 GMT, Fei Yang wrote: > > > Sorry for not being accurate. In fact, I mean requirement at the Java level. Why should we keep the origin value of inactive elements from the input vector? I didn't notice such a requirement before. > > > > > > I'm not sure about other places, but in vector APi, if you do operations with a mask, then the untouched (inactive in riscv vector insts) elements should be unmodified, i.e. same as original values. > > It will be helpful if you could point to the specific code or examples. For the example usage, please check the test code, e.g. https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/Byte64VectorTests.java#L5458 For the courterpart of this intrinsic in arm, please check https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L6391 Hope these information are helpful. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19153#issuecomment-2102732804 From kvn at openjdk.org Thu May 9 15:29:58 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 9 May 2024 15:29:58 GMT Subject: RFR: 8331862: Remove split relocation info implementation [v3] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 19:34:10 GMT, Vladimir Kozlov wrote: >> [Split relocation info](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/relocInfo.hpp#L225) was used only for SPARC. Non of current OpenJDK platforms use it. >> >> Tested tier1-3,stress,xcomp. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > address comments Thank you, Amit and Fei, for testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19126#issuecomment-2102885628 From duke at openjdk.org Thu May 9 16:56:24 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Thu, 9 May 2024 16:56:24 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v17] In-Reply-To: References: Message-ID: <3nP3cGJZXnHXo2XZDKxZGj1aNIsKW8D1lQUl_nNwDuQ=.1404a4f7-6213-4f3b-a975-291202849538@github.com> > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: enable EEVEX encoding of vex map2 instructions when UseAVX=2 if UseAPX=true ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/52628798..d4ecb31c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=15-16 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From duke at openjdk.org Thu May 9 16:56:24 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Thu, 9 May 2024 16:56:24 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v15] In-Reply-To: References: Message-ID: <44Vx5Qdjf_VFHK0outx5pqCShAaEWqxz_vKurFE0WgE=.4b2bd04a-9422-4e9d-9f10-26fb59e8cba0@github.com> On Thu, 9 May 2024 11:21:28 GMT, Jatin Bhateja wrote: >> Thanks @jatin-bhateja . Do you mean a check such as: >> >> `if ((UseAVX > 2 || UseAPX) && !attributes->is_legacy_mode())` ? > >> UseAPX > > Yes, attaching a test depicting incorrectness with UseAVX=2 for SHLX which is a legacy map 2 instruction promotable to extended EVEX with EGPR operands. > [shift_left_APX.txt](https://github.com/openjdk/jdk/files/15261495/shift_left_APX.txt) > > It will not be appropriate to modify VM_Version::supports_evex for APX feature since its used for constraining dynamic register classes associated with vector operands. Ok, thanks. I've added the above change to the conditionals in the vex_prefix and vex_prefix_and_encode functions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1595713834 From jbhateja at openjdk.org Thu May 9 19:34:57 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 9 May 2024 19:34:57 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v17] In-Reply-To: <3nP3cGJZXnHXo2XZDKxZGj1aNIsKW8D1lQUl_nNwDuQ=.1404a4f7-6213-4f3b-a975-291202849538@github.com> References: <3nP3cGJZXnHXo2XZDKxZGj1aNIsKW8D1lQUl_nNwDuQ=.1404a4f7-6213-4f3b-a975-291202849538@github.com> Message-ID: <9Q2ix7wJTkRyivF1JND_cjcoI6vn1O2przOrcnpJBXM=.8fca1722-ec94-4066-ab8d-8f5f5673e430@github.com> On Thu, 9 May 2024 16:56:24 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > enable EEVEX encoding of vex map2 instructions when UseAVX=2 if UseAPX=true src/hotspot/cpu/x86/assembler_x86.cpp line 4914: > 4912: assert(VM_Version::supports_sse4_1(), ""); > 4913: InstructionAttr attributes(AVX_128bit, /* rex_w */ true, /* legacy_mode */ _legacy_mode_dq, /* no_mask_reg */ true, /* uses_vl */ false); > 4914: int encode = simd_prefix_and_encode(dst, dst, as_XMMRegister(src->encoding()), VEX_SIMD_66, VEX_OPCODE_0F_3A, &attributes, true); _legacy_mode_dq and _legacy_mode_bw will be true for non AVX512DQ/BW targets, this will cause incorrectness since our scheme has been to treat those as non-legacy instructions upfront and only perform legacy demotions in leaf level routines if non of the register operand is an EGPR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1595886429 From duke at openjdk.org Thu May 9 21:47:35 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Thu, 9 May 2024 21:47:35 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v17] In-Reply-To: <9Q2ix7wJTkRyivF1JND_cjcoI6vn1O2przOrcnpJBXM=.8fca1722-ec94-4066-ab8d-8f5f5673e430@github.com> References: <3nP3cGJZXnHXo2XZDKxZGj1aNIsKW8D1lQUl_nNwDuQ=.1404a4f7-6213-4f3b-a975-291202849538@github.com> <9Q2ix7wJTkRyivF1JND_cjcoI6vn1O2przOrcnpJBXM=.8fca1722-ec94-4066-ab8d-8f5f5673e430@github.com> Message-ID: On Thu, 9 May 2024 19:32:24 GMT, Jatin Bhateja wrote: >> Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: >> >> enable EEVEX encoding of vex map2 instructions when UseAVX=2 if UseAPX=true > > src/hotspot/cpu/x86/assembler_x86.cpp line 4914: > >> 4912: assert(VM_Version::supports_sse4_1(), ""); >> 4913: InstructionAttr attributes(AVX_128bit, /* rex_w */ true, /* legacy_mode */ _legacy_mode_dq, /* no_mask_reg */ true, /* uses_vl */ false); >> 4914: int encode = simd_prefix_and_encode(dst, dst, as_XMMRegister(src->encoding()), VEX_SIMD_66, VEX_OPCODE_0F_3A, &attributes, true); > > _legacy_mode_dq and _legacy_mode_bw will be true for non AVX512DQ/BW targets, this will cause incorrectness since our scheme has been to treat those as non-legacy instructions upfront and only perform legacy demotions in leaf level routines if non of the register operand is an EGPR. In general, the legacy mode will be set to true whenever UseAVX < 3, due to logic in the InstructionAttr ctor. `_legacy_mode(legacy_mode || UseAVX < 3` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1596017447 From mdoerr at openjdk.org Thu May 9 21:59:33 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 9 May 2024 21:59:33 GMT Subject: RFR: 8331862: Remove split relocation info implementation [v3] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 19:34:10 GMT, Vladimir Kozlov wrote: >> [Split relocation info](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/relocInfo.hpp#L225) was used only for SPARC. Non of current OpenJDK platforms use it. >> >> Tested tier1-3,stress,xcomp. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > address comments tier1 and many more tests have also passed on PPC64. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19126#issuecomment-2103466537 From kvn at openjdk.org Thu May 9 23:04:05 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 9 May 2024 23:04:05 GMT Subject: RFR: 8331862: Remove split relocation info implementation [v3] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 19:34:10 GMT, Vladimir Kozlov wrote: >> [Split relocation info](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/relocInfo.hpp#L225) was used only for SPARC. Non of current OpenJDK platforms use it. >> >> Tested tier1-3,stress,xcomp. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > address comments Thank you, Martin ------------- PR Comment: https://git.openjdk.org/jdk/pull/19126#issuecomment-2103575606 From kvn at openjdk.org Thu May 9 23:46:08 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 9 May 2024 23:46:08 GMT Subject: Integrated: 8331862: Remove split relocation info implementation In-Reply-To: References: Message-ID: On Tue, 7 May 2024 16:16:33 GMT, Vladimir Kozlov wrote: > [Split relocation info](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/relocInfo.hpp#L225) was used only for SPARC. Non of current OpenJDK platforms use it. > > Tested tier1-3,stress,xcomp. This pull request has now been integrated. Changeset: a643d6c7 Author: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/a643d6c7ac8a7bc0d3a288c1ef3f07876cf70590 Stats: 127 lines in 10 files changed: 2 ins; 65 del; 60 mod 8331862: Remove split relocation info implementation Reviewed-by: dlong ------------- PR: https://git.openjdk.org/jdk/pull/19126 From jwaters at openjdk.org Fri May 10 01:00:04 2024 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 10 May 2024 01:00:04 GMT Subject: RFR: 8331908: Simplify log code in vectorintrinsics.cpp In-Reply-To: References: Message-ID: <7baXLapkFPMESg7GfO26_rP-ADGub_eN6TfTOx6Th2c=.0f17c78e-7b01-423b-bf41-47b47ebc2b7c@github.com> On Wed, 8 May 2024 08:41:31 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Curretly, log code in vectorintrinsics.cpp is a bit redundant, could be simplified a bit. > Thanks. > > ## Test > sanity test, jdk/incubator/vector Marked as reviewed by jwaters (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19135#pullrequestreview-2049066571 From kvn at openjdk.org Fri May 10 01:32:17 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 10 May 2024 01:32:17 GMT Subject: RFR: 8331908: Simplify log code in vectorintrinsics.cpp In-Reply-To: References: Message-ID: On Wed, 8 May 2024 08:41:31 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Curretly, log code in vectorintrinsics.cpp is a bit redundant, could be simplified a bit. > Thanks. > > ## Test > sanity test, jdk/incubator/vector Good. I will run our testing before approval. ------------- PR Review: https://git.openjdk.org/jdk/pull/19135#pullrequestreview-2049097766 From kvn at openjdk.org Fri May 10 02:34:02 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 10 May 2024 02:34:02 GMT Subject: RFR: 8331908: Simplify log code in vectorintrinsics.cpp In-Reply-To: References: Message-ID: On Wed, 8 May 2024 08:41:31 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Curretly, log code in vectorintrinsics.cpp is a bit redundant, could be simplified a bit. > Thanks. > > ## Test > sanity test, jdk/incubator/vector My testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19135#pullrequestreview-2049144736 From jbhateja at openjdk.org Fri May 10 05:10:07 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 10 May 2024 05:10:07 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v17] In-Reply-To: References: <3nP3cGJZXnHXo2XZDKxZGj1aNIsKW8D1lQUl_nNwDuQ=.1404a4f7-6213-4f3b-a975-291202849538@github.com> <9Q2ix7wJTkRyivF1JND_cjcoI6vn1O2przOrcnpJBXM=.8fca1722-ec94-4066-ab8d-8f5f5673e430@github.com> Message-ID: On Thu, 9 May 2024 21:42:34 GMT, Steve Dohrmann wrote: >> src/hotspot/cpu/x86/assembler_x86.cpp line 4914: >> >>> 4912: assert(VM_Version::supports_sse4_1(), ""); >>> 4913: InstructionAttr attributes(AVX_128bit, /* rex_w */ true, /* legacy_mode */ _legacy_mode_dq, /* no_mask_reg */ true, /* uses_vl */ false); >>> 4914: int encode = simd_prefix_and_encode(dst, dst, as_XMMRegister(src->encoding()), VEX_SIMD_66, VEX_OPCODE_0F_3A, &attributes, true); >> >> _legacy_mode_dq and _legacy_mode_bw will be true for non AVX512DQ/BW targets, this will cause incorrectness since our scheme has been to treat those as non-legacy instructions upfront and only perform legacy demotions in leaf level routines if non of the register operand is an EGPR. > > In general, the legacy mode will be set to true whenever UseAVX < 3, due to logic in the InstructionAttr ctor. > > `_legacy_mode(legacy_mode || UseAVX < 3` PFA a test point depicting the problem. [insertQ_map3_eevex.txt](https://github.com/openjdk/jdk/files/15270533/insertQ_map3_eevex.txt) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1596259907 From jbhateja at openjdk.org Fri May 10 05:26:07 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 10 May 2024 05:26:07 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v17] In-Reply-To: References: <3nP3cGJZXnHXo2XZDKxZGj1aNIsKW8D1lQUl_nNwDuQ=.1404a4f7-6213-4f3b-a975-291202849538@github.com> <9Q2ix7wJTkRyivF1JND_cjcoI6vn1O2przOrcnpJBXM=.8fca1722-ec94-4066-ab8d-8f5f5673e430@github.com> Message-ID: On Fri, 10 May 2024 05:07:21 GMT, Jatin Bhateja wrote: >> In general, the legacy mode will be set to true whenever UseAVX < 3, due to logic in the InstructionAttr ctor. >> >> `_legacy_mode(legacy_mode || UseAVX < 3` > > PFA a test point depicting the problem. > [insertQ_map3_eevex.txt](https://github.com/openjdk/jdk/files/15270533/insertQ_map3_eevex.txt) For previously attached test point, we see illegal instruction encoding with UseAVX=0 Illegal instruction at address = 7f147a64af08: 66 d5 18 0f 3a 22 c0 01 f3 0f 7f 46 10 d5 10 Image name: not from an image If you believe your application should attempt to execute this illegal instruction (and others that may be present), Then use this knob: -emit-illegal-insts 0 and this error message will be avoided. SDE ERROR: Illegal instruction at address = 7f147a64af08: 66 d5 18 0f 3a 22 c0 01 f3 0f 7f 46 10 d5 10 PINSRQ being a legacy MAP3 instruction should be promoted to Extended EVEX encoding, in this case an incorrect REX2 prefix is being emitted. `Command line: sde -dmr -- java -XX:-TieredCompilation -Xbatch --add-modules=jdk.incubator.vector -XX:UseAVX=0 -XX:CompileCommand=Print,insertQ::micro -cp . insertQ` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1596272481 From duke at openjdk.org Fri May 10 06:04:10 2024 From: duke at openjdk.org (duke) Date: Fri, 10 May 2024 06:04:10 GMT Subject: Withdrawn: 8315066: Add unsigned bounds and known bits to TypeInt/Long In-Reply-To: References: Message-ID: <50TYSexOVLaUyHAI7tCmZP7RtfCJ4xKi2i-joOCUI8M=.c701de97-4813-4f55-8f64-6811db0694a7@github.com> On Sat, 20 Jan 2024 19:23:23 GMT, Quan Anh Mai wrote: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/17508 From mli at openjdk.org Fri May 10 06:28:07 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 10 May 2024 06:28:07 GMT Subject: RFR: 8331908: Simplify log code in vectorintrinsics.cpp In-Reply-To: References: Message-ID: On Fri, 10 May 2024 02:31:13 GMT, Vladimir Kozlov wrote: >> Hi, >> Can you help to review this simple patch? >> Curretly, log code in vectorintrinsics.cpp is a bit redundant, could be simplified a bit. >> Thanks. >> >> ## Test >> sanity test, jdk/incubator/vector > > My testing passed. Thanks @vnkozlov @TheShermanTanker for your reviewing and testing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19135#issuecomment-2103949861 From mli at openjdk.org Fri May 10 06:28:08 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 10 May 2024 06:28:08 GMT Subject: Integrated: 8331908: Simplify log code in vectorintrinsics.cpp In-Reply-To: References: Message-ID: On Wed, 8 May 2024 08:41:31 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Curretly, log code in vectorintrinsics.cpp is a bit redundant, could be simplified a bit. > Thanks. > > ## Test > sanity test, jdk/incubator/vector This pull request has now been integrated. Changeset: f47fc867 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/f47fc867b3518cb285d39f7b157bf7fde87b2083 Stats: 497 lines in 1 file changed: 14 ins; 322 del; 161 mod 8331908: Simplify log code in vectorintrinsics.cpp Reviewed-by: jwaters, kvn ------------- PR: https://git.openjdk.org/jdk/pull/19135 From fyang at openjdk.org Fri May 10 06:33:03 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 10 May 2024 06:33:03 GMT Subject: RFR: 8331577: RISC-V: C2 CountLeadingZerosV [v2] In-Reply-To: <8FNJwg59AJZc59jms3X0vBA2LG4d6oEexzqJUq7cT1A=.4bbd1d52-7cd5-437b-9e25-77e1d0e245c3@github.com> References: <8PHpGLNxXgcc-oM9IAc9UnnJNCaF34NnEHFP2R2nSvs=.383399b1-e70e-460c-8efe-9d88ab6a34ba@github.com> <47txZsG98U3vKdhefoQGDYz5g6IPFFWWzQFI9P6pA0A=.1a396733-46e7-4eb5-9c56-d6293196056f@github.com> <8FNJwg59AJZc59jms3X0vBA2LG4d6oEexzqJUq7cT1A=.4bbd1d52-7cd5-437b-9e25-77e1d0e245c3@github.com> Message-ID: On Thu, 9 May 2024 14:07:00 GMT, Hamlin Li wrote: > > > > Sorry for not being accurate. In fact, I mean requirement at the Java level. Why should we keep the origin value of inactive elements from the input vector? I didn't notice such a requirement before. > > > > > > > > > I'm not sure about other places, but in vector APi, if you do operations with a mask, then the untouched (inactive in riscv vector insts) elements should be unmodified, i.e. same as original values. > > > > > > It will be helpful if you could point to the specific code or examples. > > For the example usage, please check the test code, e.g. https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/Byte64VectorTests.java#L5458 For the courterpart of this intrinsic in arm, please check https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L6391 Hope these information are helpful. Yeah, I think you are right. This is also reflected in the vector api source code like [1] [2]. [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java#L184 [2] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java#L231 ------------- PR Comment: https://git.openjdk.org/jdk/pull/19153#issuecomment-2103955774 From mli at openjdk.org Fri May 10 07:10:15 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 10 May 2024 07:10:15 GMT Subject: RFR: 8331577: RISC-V: C2 CountLeadingZerosV [v3] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch adding CountLeadingZerosV and CountTrailingZerosV instrinsics? > Thanks. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: fix masked ReverseBytesV & PopCountV by sharing dst&src regs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19153/files - new: https://git.openjdk.org/jdk/pull/19153/files/1d5d17fe..0aaa0834 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19153&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19153&range=01-02 Stats: 9 lines in 1 file changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/19153.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19153/head:pull/19153 PR: https://git.openjdk.org/jdk/pull/19153 From fyang at openjdk.org Fri May 10 07:21:03 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 10 May 2024 07:21:03 GMT Subject: RFR: 8331577: RISC-V: C2 CountLeadingZerosV [v3] In-Reply-To: References: Message-ID: On Fri, 10 May 2024 07:10:15 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch adding CountLeadingZerosV and CountTrailingZerosV instrinsics? >> Thanks. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > fix masked ReverseBytesV & PopCountV by sharing dst&src regs src/hotspot/cpu/riscv/riscv_v.ad line 3766: > 3764: instruct vreverse_bytes_masked(vReg dst_src, vRegMask_V0 v0) %{ > 3765: match(Set dst_src (ReverseBytesV dst_src v0)); > 3766: format %{ "vreverse_bytes_masked $dst_src, $dst_src, v0" %} Nit: I think we can use something more accurate like `vrev8.v` as the opcode name in format. That will be consistent with the RVV spec. Also I suggest `v0.t` instead of `v0` or `$v0` as the mask for predicated instructs (Might deserve a separate PR for cleaning up other existing predicated instructs). Similar for other newly added instructs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19153#discussion_r1596355741 From mli at openjdk.org Fri May 10 07:39:05 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 10 May 2024 07:39:05 GMT Subject: RFR: 8331577: RISC-V: C2 CountLeadingZerosV [v3] In-Reply-To: References: Message-ID: On Fri, 10 May 2024 07:10:36 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> fix masked ReverseBytesV & PopCountV by sharing dst&src regs > > src/hotspot/cpu/riscv/riscv_v.ad line 3766: > >> 3764: instruct vreverse_bytes_masked(vReg dst_src, vRegMask_V0 v0) %{ >> 3765: match(Set dst_src (ReverseBytesV dst_src v0)); >> 3766: format %{ "vreverse_bytes_masked $dst_src, $dst_src, v0" %} > > Nit: I think we can use something more accurate like `vrev8.v` as the opcode name in format. That will be consistent with the RVV spec. Also I suggest `v0.t` instead of `v0` or `$v0` as the mask for predicated instructs (Might deserve a separate PR for cleaning up other existing predicated instructs). Similar for other newly added instructs. Sure, let me do it later, tracked by https://bugs.openjdk.org/browse/JDK-8332030. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19153#discussion_r1596384248 From fyang at openjdk.org Fri May 10 08:22:02 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 10 May 2024 08:22:02 GMT Subject: RFR: 8331577: RISC-V: C2 CountLeadingZerosV [v3] In-Reply-To: References: Message-ID: On Fri, 10 May 2024 07:10:15 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch adding CountLeadingZerosV and CountTrailingZerosV instrinsics? >> Thanks. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > fix masked ReverseBytesV & PopCountV by sharing dst&src regs Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19153#pullrequestreview-2049557417 From chagedorn at openjdk.org Fri May 10 09:20:20 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 10 May 2024 09:20:20 GMT Subject: RFR: 8331764: C2 SuperWord: refactor _align_to_ref/_mem_ref_for_main_loop_alignment In-Reply-To: References: Message-ID: <8GxZQOQcBkihzzemSKUg3umrWvN3-qH16jxlSoKWoe8=.d537bffd-7cf4-4a20-887f-316e066387ca@github.com> On Tue, 7 May 2024 09:26:11 GMT, Emanuel Peter wrote: > This PR accomplishes these things: > - Rename `_align_to_ref` -> `_mem_ref_for_main_loop_alignment`. > - Move the `mem_ref` finding for alignment out of `SuperWord::find_adjacent_refs`. This is too early, and we don't even know if the relevant `mem_ref` is going to be vectorized. It makes more sense to pick a `mem_ref` directly in `SuperWord::adjust_pre_loop_limit_to_align_main_loop_vectors`, where we already know what packs are going to be vectorized. > - For the alignment width (aw), we can use the `vector_width` of the pack to which the `mem_ref` belongs, rather than the potentially much larger `vector_width_in_bytes`. I track this with `_aw_for_main_loop_alignment` now. > > I need this for https://github.com/openjdk/jdk/pull/18822, and decided to split it out into an independent change. Looks good to me, too! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19115#pullrequestreview-2049677165 From chagedorn at openjdk.org Fri May 10 09:27:14 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 10 May 2024 09:27:14 GMT Subject: RFR: 8330584: IGV: XML does not save all node properties [v2] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 12:10:23 GMT, Tobias Holenstein wrote: >> When C2 sends graphs over the network to IGV, each graph is sent separately. The same applies if C2 saves graphs to XML: each graph is saved with all it's nodes as a separate `...` in the XML >> >> To save space, graphs that are saved from IGV only contains the incremental difference for each graph. This saves a lot of space (~5-10x). The logic happens in Printer.java -> `exportInputGraph(.., difference=true, ...)` Unfortunately, there is a bug in this logic: the properties of the nodes are not saved correctly. >> >> [graphs.zip](https://github.com/openjdk/jdk/files/15220940/graphs.zip) contains 4 graphs: >> >> `graph_c2.xml` (230KB) - a XML saved from C2 >> `graph_igv_bug.xml` (73KB) - opened `graph_c2.xml` in IGV (without this fix) and save as `graph_igv_bug.xml`. >> `graph_igv_fixed.xml` (123KB) - opened `graph_c2.xml` in IGV (with this fix) and save as `graph_igv_fixed.xml `. >> >> As you can see `graph_igv_fixed.xml` is twice as large as `graph_igv_bug.xml` because it contains the missing properties. But now the memory saving from the original `graph_c2.xml` is only ~2x. >> Therefore a new format for saving is added: graphs can now be saved and opened from IGV as `.igv`. This uses a compressed (ZIP) format. >> >> `graph.igv` (10KB) is the same graph as `graph_c2.xml` (230KB). But it uses difference graph compression and ZIP compression and is in total 23x smaller in memory footprint. >> >> >> >> E.g. The root in the last graph of difference_true.xml has way less properties than in difference_false.xml. > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > Update src/utils/IdealGraphVisualizer/Coordinator/src/main/java/com/sun/hotspot/igv/coordinator/OutlineTopComponent.java > > Co-authored-by: Roberto Casta?eda Lozano Looks good to me, too! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19104#pullrequestreview-2049689287 From chagedorn at openjdk.org Fri May 10 09:40:31 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 10 May 2024 09:40:31 GMT Subject: RFR: 8330584: IGV: XML does not save all node properties [v2] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 12:10:23 GMT, Tobias Holenstein wrote: >> When C2 sends graphs over the network to IGV, each graph is sent separately. The same applies if C2 saves graphs to XML: each graph is saved with all it's nodes as a separate `...` in the XML >> >> To save space, graphs that are saved from IGV only contains the incremental difference for each graph. This saves a lot of space (~5-10x). The logic happens in Printer.java -> `exportInputGraph(.., difference=true, ...)` Unfortunately, there is a bug in this logic: the properties of the nodes are not saved correctly. >> >> [graphs.zip](https://github.com/openjdk/jdk/files/15220940/graphs.zip) contains 4 graphs: >> >> `graph_c2.xml` (230KB) - a XML saved from C2 >> `graph_igv_bug.xml` (73KB) - opened `graph_c2.xml` in IGV (without this fix) and save as `graph_igv_bug.xml`. >> `graph_igv_fixed.xml` (123KB) - opened `graph_c2.xml` in IGV (with this fix) and save as `graph_igv_fixed.xml `. >> >> As you can see `graph_igv_fixed.xml` is twice as large as `graph_igv_bug.xml` because it contains the missing properties. But now the memory saving from the original `graph_c2.xml` is only ~2x. >> Therefore a new format for saving is added: graphs can now be saved and opened from IGV as `.igv`. This uses a compressed (ZIP) format. >> >> `graph.igv` (10KB) is the same graph as `graph_c2.xml` (230KB). But it uses difference graph compression and ZIP compression and is in total 23x smaller in memory footprint. >> >> >> >> E.g. The root in the last graph of difference_true.xml has way less properties than in difference_false.xml. > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > Update src/utils/IdealGraphVisualizer/Coordinator/src/main/java/com/sun/hotspot/igv/coordinator/OutlineTopComponent.java > > Co-authored-by: Roberto Casta?eda Lozano Just a general thought: Should we generally only save in `.igv` format and drop saving in XML format or is there any benefit to be able to store in both formats? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19104#issuecomment-2104287392 From chagedorn at openjdk.org Fri May 10 09:49:28 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 10 May 2024 09:49:28 GMT Subject: RFR: 8331993: Add counting leading/trailing zero tests for Integer In-Reply-To: <7a7fXkgF6v-sSFHCk-GT0DbHr9t8AO7bGh1X1JaF-gg=.19a655eb-cb68-446c-8207-270a2ee87492@github.com> References: <7a7fXkgF6v-sSFHCk-GT0DbHr9t8AO7bGh1X1JaF-gg=.19a655eb-cb68-446c-8207-270a2ee87492@github.com> Message-ID: On Thu, 9 May 2024 11:09:39 GMT, Hamlin Li wrote: > Hi, > Can you help to review the patch adding some test? > Currently, in hotspot/jtreg/compiler/vectorization/TestNumberOfContinuousZeros.java, there is only tests for Long, not for Integer. > Thanks. Otherwise, looks good! test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 1177: > 1175: } > 1176: > 1177: public static final String COUNTLEADINGZEROS_VI = VECTOR_PREFIX + "COUNTLEADINGZEROS_VI" + POSTFIX; Would have been better to add `_` like that: `COUNT_LEADING_ZEROS_VI` But the existing `IRNode` strings for the long versions already miss that. If you want to also fix this here, feel free to do so. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19154#pullrequestreview-2049723864 PR Review Comment: https://git.openjdk.org/jdk/pull/19154#discussion_r1596535655 From rcastanedalo at openjdk.org Fri May 10 09:51:07 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 10 May 2024 09:51:07 GMT Subject: RFR: 8330584: IGV: XML does not save all node properties [v2] In-Reply-To: References: Message-ID: <6Jqe1Ue2PTI-xcu4MyTlQLs6S5T_tK8dJC8RjY3aXBs=.d6adb162-e5b6-46a3-b6ce-65d2a9b3a3db@github.com> On Fri, 10 May 2024 09:37:44 GMT, Christian Hagedorn wrote: > Just a general thought: Should we generally only save in .igv format and drop (explicit) saving in XML format or is there any benefit to be able to store in both formats? I find the explicit XML format convenient sometimes for debugging something or doing a quick plain-text search. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19104#issuecomment-2104303354 From duke at openjdk.org Fri May 10 11:15:18 2024 From: duke at openjdk.org (Yuri Gaevsky) Date: Fri, 10 May 2024 11:15:18 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v2] In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 14:47:47 GMT, Yuri Gaevsky wrote: >> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. >> >> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. > > Yuri Gaevsky has updated the pull request incrementally with two additional commits since the last revision: > > - num_8b_elems_in_vec --> nof_vec_elems > - Removed checks for (MaxVectorSize >= 16) per @RealFYang suggestion. . ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-2104423343 From yzheng at openjdk.org Fri May 10 13:13:26 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Fri, 10 May 2024 13:13:26 GMT Subject: RFR: 8331429: [JVMCI] Cleanup JVMCIRuntime allocation routines Message-ID: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> This PR removes allocation routines that may throw exception from JVMCIRuntime. It also exports various symbols related to the hashed secondary supers table. ------------- Commit messages: - [JVMCI] Cleanup JVMCIRuntime allocation routines Changes: https://git.openjdk.org/jdk/pull/19176/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19176&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331429 Stats: 99 lines in 3 files changed: 3 ins; 41 del; 55 mod Patch: https://git.openjdk.org/jdk/pull/19176.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19176/head:pull/19176 PR: https://git.openjdk.org/jdk/pull/19176 From mli at openjdk.org Fri May 10 14:04:01 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 10 May 2024 14:04:01 GMT Subject: RFR: 8331577: RISC-V: C2 CountLeadingZerosV [v2] In-Reply-To: References: <8PHpGLNxXgcc-oM9IAc9UnnJNCaF34NnEHFP2R2nSvs=.383399b1-e70e-460c-8efe-9d88ab6a34ba@github.com> <47txZsG98U3vKdhefoQGDYz5g6IPFFWWzQFI9P6pA0A=.1a396733-46e7-4eb5-9c56-d6293196056f@github.com> <8FNJwg59AJZc59jms3X0vBA2LG4d6oEexzqJUq7cT1A=.4bbd1d52-7cd5-437b-9e25-77e1d0e245c3@github.com> Message-ID: On Fri, 10 May 2024 06:30:54 GMT, Fei Yang wrote: >>> > > Sorry for not being accurate. In fact, I mean requirement at the Java level. Why should we keep the origin value of inactive elements from the input vector? I didn't notice such a requirement before. >>> > >>> > >>> > I'm not sure about other places, but in vector APi, if you do operations with a mask, then the untouched (inactive in riscv vector insts) elements should be unmodified, i.e. same as original values. >>> >>> It will be helpful if you could point to the specific code or examples. >> >> For the example usage, please check the test code, e.g. https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/Byte64VectorTests.java#L5458 >> For the courterpart of this intrinsic in arm, please check https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L6391 >> Hope these information are helpful. > >> > > > Sorry for not being accurate. In fact, I mean requirement at the Java level. Why should we keep the origin value of inactive elements from the input vector? I didn't notice such a requirement before. >> > > >> > > >> > > I'm not sure about other places, but in vector APi, if you do operations with a mask, then the untouched (inactive in riscv vector insts) elements should be unmodified, i.e. same as original values. >> > >> > >> > It will be helpful if you could point to the specific code or examples. >> >> For the example usage, please check the test code, e.g. https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/Byte64VectorTests.java#L5458 For the courterpart of this intrinsic in arm, please check https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L6391 Hope these information are helpful. > > Yeah, I think you are right. This is also reflected in the vector api source code like Unary & Binary operator [1] [2]. > > [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java#L184 > [2] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java#L231 Thanks @RealFYang for your reviewing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19153#issuecomment-2104660027 From mli at openjdk.org Fri May 10 14:04:02 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 10 May 2024 14:04:02 GMT Subject: Integrated: 8331577: RISC-V: C2 CountLeadingZerosV In-Reply-To: References: Message-ID: On Thu, 9 May 2024 08:41:05 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch adding CountLeadingZerosV and CountTrailingZerosV instrinsics? > Thanks. This pull request has now been integrated. Changeset: f95c9374 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/f95c93740538e5e508407ec6750ed9f126fdc3c3 Stats: 72 lines in 3 files changed: 62 ins; 0 del; 10 mod 8331577: RISC-V: C2 CountLeadingZerosV 8331578: RISC-V: C2 CountTrailingZerosV Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/19153 From mli at openjdk.org Fri May 10 14:04:37 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 10 May 2024 14:04:37 GMT Subject: RFR: 8331993: Add counting leading/trailing zero tests for Integer [v2] In-Reply-To: <7a7fXkgF6v-sSFHCk-GT0DbHr9t8AO7bGh1X1JaF-gg=.19a655eb-cb68-446c-8207-270a2ee87492@github.com> References: <7a7fXkgF6v-sSFHCk-GT0DbHr9t8AO7bGh1X1JaF-gg=.19a655eb-cb68-446c-8207-270a2ee87492@github.com> Message-ID: > Hi, > Can you help to review the patch adding some test? > Currently, in hotspot/jtreg/compiler/vectorization/TestNumberOfContinuousZeros.java, there is only tests for Long, not for Integer. > Thanks. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: rename ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19154/files - new: https://git.openjdk.org/jdk/pull/19154/files/c0aaa35d..c8a543d8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19154&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19154&range=00-01 Stats: 14 lines in 3 files changed: 0 ins; 0 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/19154.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19154/head:pull/19154 PR: https://git.openjdk.org/jdk/pull/19154 From mli at openjdk.org Fri May 10 14:04:37 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 10 May 2024 14:04:37 GMT Subject: RFR: 8331993: Add counting leading/trailing zero tests for Integer [v2] In-Reply-To: References: <7a7fXkgF6v-sSFHCk-GT0DbHr9t8AO7bGh1X1JaF-gg=.19a655eb-cb68-446c-8207-270a2ee87492@github.com> Message-ID: On Fri, 10 May 2024 09:45:18 GMT, Christian Hagedorn wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> rename > > test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 1177: > >> 1175: } >> 1176: >> 1177: public static final String COUNTLEADINGZEROS_VI = VECTOR_PREFIX + "COUNTLEADINGZEROS_VI" + POSTFIX; > > Would have been better to add `_` like that: `COUNT_LEADING_ZEROS_VI` > > But the existing `IRNode` strings for the long versions already miss that. If you want to also fix this here, feel free to do so. Yes, it's more readable. Fixed. Thanks for your reviewing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19154#discussion_r1596788370 From mli at openjdk.org Fri May 10 14:04:37 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 10 May 2024 14:04:37 GMT Subject: Integrated: 8331993: Add counting leading/trailing zero tests for Integer In-Reply-To: <7a7fXkgF6v-sSFHCk-GT0DbHr9t8AO7bGh1X1JaF-gg=.19a655eb-cb68-446c-8207-270a2ee87492@github.com> References: <7a7fXkgF6v-sSFHCk-GT0DbHr9t8AO7bGh1X1JaF-gg=.19a655eb-cb68-446c-8207-270a2ee87492@github.com> Message-ID: On Thu, 9 May 2024 11:09:39 GMT, Hamlin Li wrote: > Hi, > Can you help to review the patch adding some test? > Currently, in hotspot/jtreg/compiler/vectorization/TestNumberOfContinuousZeros.java, there is only tests for Long, not for Integer. > Thanks. This pull request has now been integrated. Changeset: 675fbe69 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/675fbe699ed1aad37f34429cbe1f4f3e029be03f Stats: 67 lines in 3 files changed: 44 ins; 0 del; 23 mod 8331993: Add counting leading/trailing zero tests for Integer Reviewed-by: chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/19154 From aph at openjdk.org Fri May 10 14:28:59 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 10 May 2024 14:28:59 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v7] In-Reply-To: References: Message-ID: > At the present time, `assert_different_registers()` uses an O(N**2) algorithm in assert_different_registers(). We can utilize RegSet to do it in O(N) time. This would be a useful optimization for all builds with assertions enabled. > > In addition, it would be useful to be able to static_assert different registers. > > Also, I've taken the opportunity to expand the maximum size of a RegSet to 64 on 64-bit platforms. > > I also fixed a bug: sometimes `noreg` is passed to `assert_different_registers()`, but it may only be passed once or a spurious assertion is triggered. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/asm/register.hpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16617/files - new: https://git.openjdk.org/jdk/pull/16617/files/a945d094..36f48ad0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16617&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16617&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16617.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16617/head:pull/16617 PR: https://git.openjdk.org/jdk/pull/16617 From aph at openjdk.org Fri May 10 14:29:01 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 10 May 2024 14:29:01 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v6] In-Reply-To: References: Message-ID: <3xKYuTm22oA-SeoXK20LuPypVkTVuQNM7C9kY_tKlgs=.04a0c1cf-691d-428c-9c12-78bc02cab6d0@github.com> On Wed, 17 Jan 2024 07:32:44 GMT, Kim Barrett wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Review feedback. > > src/hotspot/share/asm/register.hpp line 257: > >> 255: >> 256: template >> 257: inline constexpr bool different_registers(AbstractRegSet allocated_regs, R first_register) { > > different_registers is only used by debug-only code in assert_different_registers. Shouldn't all the overloads > for different_registers be within an `#ifdef ASSERT` block? I could do so, but that would lose the ability to do `static_assert(different_registers(...`. I don't think that `static_assert` depends on `ASSERT`. I'm happy to make this patch debug-only, though, if you prefer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16617#discussion_r1596823414 From dnsimon at openjdk.org Fri May 10 14:30:31 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 10 May 2024 14:30:31 GMT Subject: RFR: 8331429: [JVMCI] Cleanup JVMCIRuntime allocation routines In-Reply-To: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> References: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> Message-ID: <4CEZIVjNESdAI-WNOY0akY7kd8DXgTQRy2fm1NHO-G8=.187d70c5-41a3-4af0-b6fb-06731da7403f@github.com> On Fri, 10 May 2024 13:06:21 GMT, Yudi Zheng wrote: > This PR removes allocation routines that may throw exception from JVMCIRuntime. It also exports various symbols related to the hashed secondary supers table. Please also update `InternalOOMEMark` to remove support for `thread` being `nullptr`. src/hotspot/share/jvmci/jvmciRuntime.hpp line 509: > 507: // The following routines are called from compiled JVMCI code > 508: > 509: // When allocation fails, these stubs return null and have no pending exception. Compiled code "and have no pending OutOfMemoryError exception" It's still possible for an async exception to be pending. For Graal, that's ok as it unconditional clears any pending exception when calling these stubs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19176#issuecomment-2104705327 PR Review Comment: https://git.openjdk.org/jdk/pull/19176#discussion_r1596830028 From aph at openjdk.org Fri May 10 14:58:54 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 10 May 2024 14:58:54 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v8] In-Reply-To: References: Message-ID: > At the present time, `assert_different_registers()` uses an O(N**2) algorithm in assert_different_registers(). We can utilize RegSet to do it in O(N) time. This would be a useful optimization for all builds with assertions enabled. > > In addition, it would be useful to be able to static_assert different registers. > > Also, I've taken the opportunity to expand the maximum size of a RegSet to 64 on 64-bit platforms. > > I also fixed a bug: sometimes `noreg` is passed to `assert_different_registers()`, but it may only be passed once or a spurious assertion is triggered. Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: - Review feedback - Merge branch 'different-regs' of https://github.com/theRealAph/jdk into different-regs - Update src/hotspot/share/asm/register.hpp Co-authored-by: Emanuel Peter - Merge branch 'clean' into different-regs - Review feedback. - 8319822: Use a linear-time algorithm for assert_different_registers() - 8319822: Use a linear-time algorithm for assert_different_registers() - Cleanup, fix warning on Windows. - Fix x86 - Bleurgh - ... and 3 more: https://git.openjdk.org/jdk/compare/211fe58c...0037dd29 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16617/files - new: https://git.openjdk.org/jdk/pull/16617/files/36f48ad0..0037dd29 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16617&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16617&range=06-07 Stats: 1504428 lines in 12564 files changed: 341226 ins; 719204 del; 443998 mod Patch: https://git.openjdk.org/jdk/pull/16617.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16617/head:pull/16617 PR: https://git.openjdk.org/jdk/pull/16617 From aph at openjdk.org Fri May 10 14:58:54 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 10 May 2024 14:58:54 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v7] In-Reply-To: References: Message-ID: On Fri, 10 May 2024 14:28:59 GMT, Andrew Haley wrote: >> At the present time, `assert_different_registers()` uses an O(N**2) algorithm in assert_different_registers(). We can utilize RegSet to do it in O(N) time. This would be a useful optimization for all builds with assertions enabled. >> >> In addition, it would be useful to be able to static_assert different registers. >> >> Also, I've taken the opportunity to expand the maximum size of a RegSet to 64 on 64-bit platforms. >> >> I also fixed a bug: sometimes `noreg` is passed to `assert_different_registers()`, but it may only be passed once or a spurious assertion is triggered. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/asm/register.hpp > > Co-authored-by: Emanuel Peter > I started to review the patch and was wondering if this could be simplify to something like this?: [stefank at f38c791](https://github.com/stefank/jdk/commit/f38c791793440b899ce6c4c9723470a5d4b18050) > > I tested this with this small section of temporary static_asserts: [stefank at 30da4d6](https://github.com/stefank/jdk/commit/30da4d6abeee14e4e4f44034295f1bb0ad2e3016) > > Unfortunately, that didn't compile and I had make this change to get it to work: [stefank at d6bda1a](https://github.com/stefank/jdk/commit/d6bda1a25e297865fd6b5da21184273d8825b922) OK, so I'm not going to do that, then. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16617#issuecomment-2104755838 From aph at openjdk.org Fri May 10 15:05:23 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 10 May 2024 15:05:23 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v9] In-Reply-To: References: Message-ID: > At the present time, `assert_different_registers()` uses an O(N**2) algorithm in assert_different_registers(). We can utilize RegSet to do it in O(N) time. This would be a useful optimization for all builds with assertions enabled. > > In addition, it would be useful to be able to static_assert different registers. > > Also, I've taken the opportunity to expand the maximum size of a RegSet to 64 on 64-bit platforms. > > I also fixed a bug: sometimes `noreg` is passed to `assert_different_registers()`, but it may only be passed once or a spurious assertion is triggered. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Review feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16617/files - new: https://git.openjdk.org/jdk/pull/16617/files/0037dd29..857152f6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16617&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16617&range=07-08 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16617.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16617/head:pull/16617 PR: https://git.openjdk.org/jdk/pull/16617 From aph at openjdk.org Fri May 10 15:05:23 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 10 May 2024 15:05:23 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v6] In-Reply-To: References: Message-ID: On Wed, 17 Jan 2024 07:10:13 GMT, Kim Barrett wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Review feedback. > > src/hotspot/cpu/aarch64/register_aarch64.hpp line 73: > >> 71: >> 72: constexpr bool operator==(const Register r) const { return _encoding == r._encoding; } >> 73: constexpr bool operator!=(const Register r) const { return _encoding != r._encoding; } > > This seems unrelated to the rest of this change. It also seems like something that should be done for all > of the register_ variants. It was related to another reviewer's comments, but we don't need it ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16617#discussion_r1596871037 From aph at openjdk.org Fri May 10 15:26:29 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 10 May 2024 15:26:29 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v10] In-Reply-To: References: Message-ID: > At the present time, `assert_different_registers()` uses an O(N**2) algorithm in assert_different_registers(). We can utilize RegSet to do it in O(N) time. This would be a useful optimization for all builds with assertions enabled. > > In addition, it would be useful to be able to static_assert different registers. > > Also, I've taken the opportunity to expand the maximum size of a RegSet to 64 on 64-bit platforms. > > I also fixed a bug: sometimes `noreg` is passed to `assert_different_registers()`, but it may only be passed once or a spurious assertion is triggered. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Review feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16617/files - new: https://git.openjdk.org/jdk/pull/16617/files/857152f6..693df766 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16617&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16617&range=08-09 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16617.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16617/head:pull/16617 PR: https://git.openjdk.org/jdk/pull/16617 From duke at openjdk.org Fri May 10 16:09:01 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Fri, 10 May 2024 16:09:01 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v18] In-Reply-To: References: Message-ID: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: fix entry condition for EEVEX encoding when UseAVX=2 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/d4ecb31c..aee89e7c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=16-17 Stats: 7 lines in 2 files changed: 5 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From aph at openjdk.org Fri May 10 16:16:07 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 10 May 2024 16:16:07 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v10] In-Reply-To: References: Message-ID: <8WxseMzSimvdzZMUP4VI_l6uFKcy49mMpRrLe-zgI74=.861ed0f1-a4bb-4476-9ce9-1fe7f3b2cc6c@github.com> On Fri, 10 May 2024 15:26:29 GMT, Andrew Haley wrote: >> At the present time, `assert_different_registers()` uses an O(N**2) algorithm in assert_different_registers(). We can utilize RegSet to do it in O(N) time. This would be a useful optimization for all builds with assertions enabled. >> >> In addition, it would be useful to be able to static_assert different registers. >> >> Also, I've taken the opportunity to expand the maximum size of a RegSet to 64 on 64-bit platforms. >> >> I also fixed a bug: sometimes `noreg` is passed to `assert_different_registers()`, but it may only be passed once or a spurious assertion is triggered. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Review feedback > From the summary: > > > In addition, it would be useful to be able to static_assert different registers. > > As mentioned in [#16617 (comment)](https://github.com/openjdk/jdk/pull/16617#issuecomment-1807933886) this doesn't work unless we make the proposed small tweak. Do you want to make it in this PR, or should I propose that in a separate PR? Let's do it separately. I would, but GCC has a very relaxed attitude to `static_assert` which means I can't test anything here. Everything to do with `static_assert` just seems to work. Exhuming this one after a long time. Please review, thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16617#issuecomment-2104872804 PR Comment: https://git.openjdk.org/jdk/pull/16617#issuecomment-2104873418 From jbhateja at openjdk.org Fri May 10 18:14:07 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 10 May 2024 18:14:07 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v17] In-Reply-To: References: <3nP3cGJZXnHXo2XZDKxZGj1aNIsKW8D1lQUl_nNwDuQ=.1404a4f7-6213-4f3b-a975-291202849538@github.com> <9Q2ix7wJTkRyivF1JND_cjcoI6vn1O2przOrcnpJBXM=.8fca1722-ec94-4066-ab8d-8f5f5673e430@github.com> Message-ID: On Thu, 9 May 2024 21:42:34 GMT, Steve Dohrmann wrote: >> src/hotspot/cpu/x86/assembler_x86.cpp line 4914: >> >>> 4912: assert(VM_Version::supports_sse4_1(), ""); >>> 4913: InstructionAttr attributes(AVX_128bit, /* rex_w */ true, /* legacy_mode */ _legacy_mode_dq, /* no_mask_reg */ true, /* uses_vl */ false); >>> 4914: int encode = simd_prefix_and_encode(dst, dst, as_XMMRegister(src->encoding()), VEX_SIMD_66, VEX_OPCODE_0F_3A, &attributes, true); >> >> _legacy_mode_dq and _legacy_mode_bw will be true for non AVX512DQ/BW targets, this will cause incorrectness since our scheme has been to treat those as non-legacy instructions upfront and only perform legacy demotions in leaf level routines if non of the register operand is an EGPR. > > In general, the legacy mode will be set to true whenever UseAVX < 3, due to logic in the InstructionAttr ctor. > > `_legacy_mode(legacy_mode || UseAVX < 3` Hi @steveatgh , Still getting incorrect encoding for PINSRQ at UseAVX=0 with latest patch. This is a legacy map3 instruction which should be promoted to Extended EVEX, encoding, there is no route in _Assembler::simd_prefix_and_encode_ which can lead to EVEX encoding at UseAVX=0. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1597065361 From jbhateja at openjdk.org Fri May 10 18:14:08 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 10 May 2024 18:14:08 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v17] In-Reply-To: References: <3nP3cGJZXnHXo2XZDKxZGj1aNIsKW8D1lQUl_nNwDuQ=.1404a4f7-6213-4f3b-a975-291202849538@github.com> <9Q2ix7wJTkRyivF1JND_cjcoI6vn1O2przOrcnpJBXM=.8fca1722-ec94-4066-ab8d-8f5f5673e430@github.com> Message-ID: On Fri, 10 May 2024 18:08:58 GMT, Jatin Bhateja wrote: >> In general, the legacy mode will be set to true whenever UseAVX < 3, due to logic in the InstructionAttr ctor. >> >> `_legacy_mode(legacy_mode || UseAVX < 3` > > Hi @steveatgh , > > Still getting incorrect encoding for PINSRQ at UseAVX=0 with latest patch. > > This is a legacy map3 instruction which should be promoted to Extended EVEX, encoding, there is no route in _Assembler::simd_prefix_and_encode_ which can lead to EVEX encoding at UseAVX=0. Similar problems with PINSRB/D/W and PEXTRB/W/D/Q ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1597067481 From dlong at openjdk.org Fri May 10 21:40:17 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 10 May 2024 21:40:17 GMT Subject: RFR: 8331429: [JVMCI] Cleanup JVMCIRuntime allocation routines In-Reply-To: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> References: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> Message-ID: <9wsX9310p38cnuPHGU4xKirWfyfYR6cICO6iPhnDk5Y=.55d9503f-2cc8-4c26-b24f-2ced7f8f72f5@github.com> On Fri, 10 May 2024 13:06:21 GMT, Yudi Zheng wrote: > This PR removes allocation routines that may throw exception from JVMCIRuntime. It also exports various symbols related to the hashed secondary supers table. src/hotspot/share/jvmci/jvmciRuntime.cpp line 131: > 129: // Cannot re-execute class initialization without side effects > 130: // so return without attempting the initialization > 131: return; Do we need to call `current->set_vm_result(nullptr)` on these bailout paths? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19176#discussion_r1597249765 From duke at openjdk.org Fri May 10 21:48:35 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Fri, 10 May 2024 21:48:35 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v19] In-Reply-To: References: Message-ID: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: conditionally allow EEVEX encoding when UseAVX=0 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/aee89e7c..826fa2bb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=17-18 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From duke at openjdk.org Fri May 10 21:48:35 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Fri, 10 May 2024 21:48:35 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v17] In-Reply-To: References: <3nP3cGJZXnHXo2XZDKxZGj1aNIsKW8D1lQUl_nNwDuQ=.1404a4f7-6213-4f3b-a975-291202849538@github.com> <9Q2ix7wJTkRyivF1JND_cjcoI6vn1O2przOrcnpJBXM=.8fca1722-ec94-4066-ab8d-8f5f5673e430@github.com> Message-ID: On Fri, 10 May 2024 18:11:11 GMT, Jatin Bhateja wrote: >> Hi @steveatgh , >> >> Still getting incorrect encoding for PINSRQ at UseAVX=0 with latest patch. >> >> This is a legacy map3 instruction which should be promoted to Extended EVEX, encoding, there is no route in _Assembler::simd_prefix_and_encode_ which can lead to EVEX encoding at UseAVX=0. > > Similar problems with PINSRB/D/W and PEXTRB/W/D/Q Thanks @jatin-bhateja. I added logic to ::simd_prefix_and_encode and ::simd_prefix to conditionally allow EEVEX encoding even when UseAVX=0. Tested with PINSR* and PEXTR* ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1597253753 From duke at openjdk.org Sat May 11 01:59:29 2024 From: duke at openjdk.org (xiaotaonan) Date: Sat, 11 May 2024 01:59:29 GMT Subject: RFR: 8332032: C2: Remove ExpandSubTypeCheckAtParseTime flag Message-ID: C2: Remove ExpandSubTypeCheckAtParseTime flag ------------- Commit messages: - C2: Remove ExpandSubTypeCheckAtParseTime flag Changes: https://git.openjdk.org/jdk/pull/19187/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19187&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332032 Stats: 8 lines in 4 files changed: 0 ins; 4 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19187.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19187/head:pull/19187 PR: https://git.openjdk.org/jdk/pull/19187 From ddong at openjdk.org Sat May 11 06:24:09 2024 From: ddong at openjdk.org (Denghui Dong) Date: Sat, 11 May 2024 06:24:09 GMT Subject: RFR: 8327661: C1: Make RBP allocatable on x64 when PreserveFramePointer is disabled [v3] In-Reply-To: References: Message-ID: On Wed, 13 Mar 2024 06:49:30 GMT, Denghui Dong wrote: >> Hi, >> >> Could I have a review of this change that makes RBP allocatable in c1 register allocation when PreserveFramePointer is not enabled. >> >> There seems no reason that RBP cannot be used. Although the performance of c1 jit code is not very critical, in my opinion, this change will not add overhead of compilation. So maybe it is acceptable. >> >> I am not very sure if I have changed all the places that should be. >> >> Testing: fastdebug tier1-4 on Linux x64 > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > delete jmh Gentle ping. Since the benefits are not obvious, I'll close this PR if there are no reviews for one more week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18167#issuecomment-2105590018 From dnsimon at openjdk.org Sat May 11 07:44:09 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Sat, 11 May 2024 07:44:09 GMT Subject: RFR: 8331429: [JVMCI] Cleanup JVMCIRuntime allocation routines In-Reply-To: <9wsX9310p38cnuPHGU4xKirWfyfYR6cICO6iPhnDk5Y=.55d9503f-2cc8-4c26-b24f-2ced7f8f72f5@github.com> References: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> <9wsX9310p38cnuPHGU4xKirWfyfYR6cICO6iPhnDk5Y=.55d9503f-2cc8-4c26-b24f-2ced7f8f72f5@github.com> Message-ID: On Fri, 10 May 2024 21:37:39 GMT, Dean Long wrote: >> This PR removes allocation routines that may throw exception from JVMCIRuntime. It also exports various symbols related to the hashed secondary supers table. > > src/hotspot/share/jvmci/jvmciRuntime.cpp line 131: > >> 129: // Cannot re-execute class initialization without side effects >> 130: // so return without attempting the initialization >> 131: return; > > Do we need to call `current->set_vm_result(nullptr)` on these bailout paths? That's done in `~RetryableAllocationMark`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19176#discussion_r1597386283 From fjiang at openjdk.org Sat May 11 07:51:03 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Sat, 11 May 2024 07:51:03 GMT Subject: RFR: 8331281: RISC-V: C2: Support vector-scalar and vector-immediate bitwise logic instructions In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 12:17:58 GMT, Gui Cao wrote: > Hi, We want to support vector-scalar and vector-immediate bitwise logic instructions, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot. > We can use the Int256VectorTests.java[2] to print the compilation log, verify and observe the generation of nodes. > > For example, we can use the following command to print the compilation log of a jtreg test case: > > > /home/zifeihan/jdk-tools/jtreg/bin/jtreg \ > -v:default \ > -concurrency:16 -timeout:50 \ > -javaoption:-XX:+UnlockExperimentalVMOptions \ > -javaoption:-XX:+UseRVV \ > -javaoption:-XX:+PrintOptoAssembly \ > -javaoption:-XX:LogFile=/home/zifeihan/jdk/Int256VectorTests_PrintOptoAssembly.log \ > -jdk:/home/zifeihan/jdk/build/linux-riscv64-server-fastdebug/jdk \ > /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/Int256VectorTests.java > > > > we can observe the specified compilation log `Int256VectorTests_PrintOptoAssembly.log`, which contains the vector-scalar and vector-immediate bitwise logic node for the PR implementation. > > vand_immI Node > > > 0b4 vloadcon V3 # generate iota indices > 0bc vmla V2, V2, V3, V1 > 0c4 vand_immI V2, V2, #7 > 0cc addi R7, R30, #16 # ptr, #@addP_reg_imm > 0d0 storeV [R7], V2 # vector (rvv) > > > vor_regI Node > > > 180 vor_regI V1, V1, R30 > 188 add R31, R14, R31 # ptr, #@addP_reg_reg > 18a addi R31, R31, #16 # ptr, #@addP_reg_imm > 18c storeV [R31], V1 # vector (rvv) > 194 addiw R11, R11, #8 #@addI_reg_imm > 196 blt R11, R13, B17 #@cmpI_loop P=0.500000 C=30564.000000 > > > vxor_regI Node > > 198 vxor_regI V1, V1, R30 > 1a0 add R14, R16, R14 # ptr, #@addP_reg_reg > 1a2 addi R14, R14, #16 # ptr, #@addP_reg_imm > 1a4 storeV [R14], V1 # vector (rvv) > 1ac addiw R11, R11, #8 #@addI_reg_imm > 1ae blt R11, R13, B21 #@cmpI_loop P=0.500000 C=30564.000000 > > > vand_regI_masked Node > > 234 B31: # out( B40 B32 ) <- in( B30 ) Freq: 78.5481 > 234 loadV V2, [R15] # vector (rvv) > 23c vand_regI_masked V2, V2, R11 > 244 storeV [R9], V2 # vector (rvv) > 24c mv R10, #8 # int, #@loadConI > 24e ble R7, R10, B40 #@cmpI_branch P=0.000001 C=-1.000000 > > > vor_regI_masked Node > > 1ee B32: # out( B38 B33 ) <- in( B31 ) Freq: 75.8475 > 1ee loadV V1, [R11] # vector (rvv) > 1f6 vor_regI_masked V1, V1, R31 > 1fe addi R11, R13, #32 # ptr, #@addP_reg_imm > 202 bgeu R29, R10, B38 #@cmpU_branch P=0.000001 C=-1.000000 > > vxor_regI_masked Node > > 1ee B32: # out( B38 B33 ) <- in( B31 ) Freq: 75.8475 > 1ee loadV V1, [R11]... Overall looks good, with one minor comment. src/hotspot/cpu/riscv/riscv_v.ad line 513: > 511: // vector-scalar and (unpredicated) > 512: > 513: instruct vand_regI(vReg dst_src, iRegI src) %{ Do we need `iRegIorL2I` for `RegI` related instructions? ------------- PR Review: https://git.openjdk.org/jdk/pull/18999#pullrequestreview-2051120671 PR Review Comment: https://git.openjdk.org/jdk/pull/18999#discussion_r1597383543 From dlong at openjdk.org Sat May 11 08:07:02 2024 From: dlong at openjdk.org (Dean Long) Date: Sat, 11 May 2024 08:07:02 GMT Subject: RFR: 8331429: [JVMCI] Cleanup JVMCIRuntime allocation routines In-Reply-To: References: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> <9wsX9310p38cnuPHGU4xKirWfyfYR6cICO6iPhnDk5Y=.55d9503f-2cc8-4c26-b24f-2ced7f8f72f5@github.com> Message-ID: On Sat, 11 May 2024 07:41:17 GMT, Doug Simon wrote: >> src/hotspot/share/jvmci/jvmciRuntime.cpp line 131: >> >>> 129: // Cannot re-execute class initialization without side effects >>> 130: // so return without attempting the initialization >>> 131: return; >> >> Do we need to call `current->set_vm_result(nullptr)` on these bailout paths? > > That's done in `~RetryableAllocationMark`. Only for the HAS_PENDING_EXCEPTION case. What about the !h->is_initialized() case? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19176#discussion_r1597394595 From dnsimon at openjdk.org Sat May 11 09:09:11 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Sat, 11 May 2024 09:09:11 GMT Subject: RFR: 8331429: [JVMCI] Cleanup JVMCIRuntime allocation routines In-Reply-To: References: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> <9wsX9310p38cnuPHGU4xKirWfyfYR6cICO6iPhnDk5Y=.55d9503f-2cc8-4c26-b24f-2ced7f8f72f5@github.com> Message-ID: On Sat, 11 May 2024 08:04:01 GMT, Dean Long wrote: >> That's done in `~RetryableAllocationMark`. > > Only for the HAS_PENDING_EXCEPTION case. What about the !h->is_initialized() case? Good observation - seems like this is an outstanding bug. Can you please address that Yudi. In practice, I wonder how much this matters as Graal always [clears the object result](https://github.com/oracle/graal/blob/0b61d20b08b1af76bd35cfb673c7be8d33855f51/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/hotspot/stubs/ForeignCallSnippets.java#L127) after reading it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19176#discussion_r1597405871 From jbhateja at openjdk.org Sat May 11 21:25:06 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 11 May 2024 21:25:06 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers In-Reply-To: <2ix8fZdbyXTav2FBERlzl7U6JkI3i9hPFGSNKbrDlpo=.a219b3de-7035-44d0-9bdc-3ea599800eb3@github.com> References: <2ix8fZdbyXTav2FBERlzl7U6JkI3i9hPFGSNKbrDlpo=.a219b3de-7035-44d0-9bdc-3ea599800eb3@github.com> Message-ID: On Fri, 3 May 2024 19:38:08 GMT, Steve Dohrmann wrote: >> How can we be confident that the encoding is correct? Would it be possible to write tests for this? Maybe one that disassembles it and compares the result to a 3rd party disassembler offline or in-process hsdis? > > In response to @dean-long, @theRealAph wrote: >> When we wrote the AArch64 port, there was no available hardware to test it on. So, we wrote a simulator to test it. However, we ran the risk that if our understanding of instruction encoding was wrong, our assembler and our simulator might appear to work correctly when used together, but the result would not run on real AArch64 hardware once it arrived. So, as well as a simulator for the architecture, we verified the internal HotSpot assembler by checking its encoding against GNU `as`. See /test/hotspot/gtest/aarch64, where a Python program generates source for both the HotSpot internal assembler and GNU `as`. I strongly suggest you do something similar. (As a matter for the historical record, this did work. The test found several encoding bugs. Once we got the first real AArch64 hardware, the port worked almost immediately.) > > Thanks for the description. It would be great to create a similar tool for x86. I tested the encoding manually using the SDE as the authoritative source. It is tedious though and very time consuming. > > A subsequent PR in [JDK-8329030](https://bugs.openjdk.org/browse/JDK-8329030), perhaps the one that adds encoding support for New Data Destination variants, should include such a tool. Hi @steveatgh , I have few more comments. A) With recent change register only flavors of cvtsi2ss / cvtsi2sd / cvttsd2si/ cvttss2si which are all legacy map 1 instruction and are encoded using REX prefixes at UseAVX=0 will now be promoted to EEVEX which is a fixed 4 byte prefix, we should use REX2 instead. [cvtsi2ss_MAP1_with_EEVEX.txt](https://github.com/openjdk/jdk/files/15284294/cvtsi2ss_MAP1_with_EEVEX.txt) B) Memory operand flavor of paddd : Missing EVEX tuples for memory operand instructions, it will prevent applying EVEX compressed displacement (disp8*N) encoding optimization. FTR: These are map 1 legacy instruction which could be encoded using SIMD + REX prefix, which adds up to two byte prefix, currently we promote them to VEX encoding in order to zero upper 128 bits, this added additional byte penalty in prefix since it used three byte VEX prefix (c4), now we will encode it using EEVEX if address operands (BASE/INDEX) is a EGPR which will add another byte to prefix since EVEX is a fixed 4 byte prefix. As mentioned above at UseAVX=0 we should encode them using REX2. [paddd_MAP1_VEX_now_EEVEX.txt](https://github.com/openjdk/jdk/files/15284296/paddd_MAP1_VEX_now_EEVEX.txt) C) Memory operand flavor of pcmpestri, ptest and vptest. - missing address tuple - legacy mode is true should be false. Kindly incorporate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2106034198 From duke at openjdk.org Sun May 12 02:01:26 2024 From: duke at openjdk.org (xiaotaonan) Date: Sun, 12 May 2024 02:01:26 GMT Subject: RFR: 8332032: C2: Remove ExpandSubTypeCheckAtParseTime flag [v2] In-Reply-To: References: Message-ID: <2O249IeKBpJ_BTBw_bIf5gkIn_eDjUFBXl_Q1GjQcmY=.b8c003d3-452e-4fe5-ae4c-53e0d57c4dea@github.com> > C2: Remove ExpandSubTypeCheckAtParseTime flag xiaotaonan has updated the pull request incrementally with one additional commit since the last revision: Add API to access ZipEntry.extraAttributes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19187/files - new: https://git.openjdk.org/jdk/pull/19187/files/681db95d..150ce858 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19187&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19187&range=00-01 Stats: 17 lines in 1 file changed: 17 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19187.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19187/head:pull/19187 PR: https://git.openjdk.org/jdk/pull/19187 From duke at openjdk.org Sun May 12 02:57:08 2024 From: duke at openjdk.org (xiaotaonan) Date: Sun, 12 May 2024 02:57:08 GMT Subject: Withdrawn: 8332032: C2: Remove ExpandSubTypeCheckAtParseTime flag In-Reply-To: References: Message-ID: <9q-cX7RJKLeJyXdE9v_BqerfpZOTY5yX6wTwsyVg0eE=.5a8faea6-416d-40ea-be3c-602d9841fb96@github.com> On Sat, 11 May 2024 01:55:25 GMT, xiaotaonan wrote: > C2: Remove ExpandSubTypeCheckAtParseTime flag This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/19187 From duke at openjdk.org Sun May 12 03:07:19 2024 From: duke at openjdk.org (xiaotaonan) Date: Sun, 12 May 2024 03:07:19 GMT Subject: RFR: 8332032: C2: Remove ExpandSubTypeCheckAtParseTime flag Message-ID: C2: Remove ExpandSubTypeCheckAtParseTime flag ------------- Commit messages: - C2: Remove ExpandSubTypeCheckAtParseTime flag Changes: https://git.openjdk.org/jdk/pull/19205/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19205&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332032 Stats: 10 lines in 4 files changed: 0 ins; 4 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19205.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19205/head:pull/19205 PR: https://git.openjdk.org/jdk/pull/19205 From liach at openjdk.org Sun May 12 15:14:04 2024 From: liach at openjdk.org (Chen Liang) Date: Sun, 12 May 2024 15:14:04 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v8] In-Reply-To: References: Message-ID: <3eERzqYdCd4f9qn4KpzBA9ealaUTzC67wIhzB18ETTE=.f9d17a6f-1ca5-477f-8344-40c20abe7d7e@github.com> On Mon, 6 May 2024 18:24:25 GMT, Adam Sotona wrote: >> Hi, >> During performance optimization work on Class-File API as JDK lambda generator we found some static initialization killers. >> One of them is `java.lang.classfile.Attributes` with tens of static fields initialized with individual attribute mappers, and common set of all mappers, and static map from attribute names to the mappers. >> >> I propose to turn all the static fields into lazy-initialized static methods and remove `PREDEFINED_ATTRIBUTES` and `standardAttribute(Utf8Entry name)` static mapping method from the `Attributes` API class. >> >> Please let me know your comments or objections and please review the [PR](https://github.com/openjdk/jdk/pull/19006) and [CSR](https://bugs.openjdk.org/browse/JDK-8331414), so we can make it into 23. >> >> Thank you, >> Adam > > Adam Sotona has updated the pull request incrementally with one additional commit since the last revision: > > fixed tests src/java.base/share/classes/java/lang/classfile/Attributes.java line 153: > 151: > 152: /** > 153: * {@return Attribute mapper for the {@code AnnotationDefault} attribute} Just wondering, can we change `{@code AnnotationDefault}` to `{@value #NAME_ANNOTATION_DEFAULT}`, etc? This way, the names are still rendered as code in Javadoc HTML, but they are generated with links to the constants, and programmers will see these constants and prefer them over hardcoded values. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19006#discussion_r1597655934 From duke at openjdk.org Mon May 13 01:02:09 2024 From: duke at openjdk.org (xiaotaonan) Date: Mon, 13 May 2024 01:02:09 GMT Subject: RFR: 8332032: C2: Remove ExpandSubTypeCheckAtParseTime flag In-Reply-To: References: Message-ID: On Sun, 12 May 2024 03:02:39 GMT, xiaotaonan wrote: > C2: Remove ExpandSubTypeCheckAtParseTime flag @lgxbslgx ------------- PR Comment: https://git.openjdk.org/jdk/pull/19205#issuecomment-2106449919 From galder at openjdk.org Mon May 13 05:04:38 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 13 May 2024 05:04:38 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v16] In-Reply-To: References: Message-ID: > Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. > > The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: > > > $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op > ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op > ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op > ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op > ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op > ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op > ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op > ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op > ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op > ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op > ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op > ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op > ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op > ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op > ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op > > > It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. > > I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. > > > $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > ... > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 > > > One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? > > Thanks @rwestrel for his help shaping this up :) Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/c1/c1_GraphBuilder.cpp Co-authored-by: Dean Long <17332032+dean-long at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17667/files - new: https://git.openjdk.org/jdk/pull/17667/files/a35cdd84..c3b7fa47 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17667&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17667&range=14-15 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17667.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17667/head:pull/17667 PR: https://git.openjdk.org/jdk/pull/17667 From chagedorn at openjdk.org Mon May 13 05:42:02 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 13 May 2024 05:42:02 GMT Subject: RFR: 8332032: C2: Remove ExpandSubTypeCheckAtParseTime flag In-Reply-To: References: Message-ID: <_VqdctFR4arGmdWQk9opoNMe6h1Rwa0gKDWHEcHyO9Y=.2ea3c35b-660d-4431-bc47-e0a874c386ce@github.com> On Mon, 13 May 2024 00:59:18 GMT, xiaotaonan wrote: >> C2: Remove ExpandSubTypeCheckAtParseTime flag > > @lgxbslgx Hi @xiaotaonan, please first ask in JBS if you can take over RFEs/bugs that are already assigned like this one, especially if it has just been filed. This PR misses the entire context why this flag should be removed and what the pros/cons and trade-offs are. I planned to do some more offline discussions first before proposing the actual PR to remove this flag since it is now related to an otherwise hard-to-fix bug in Valhalla. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19205#issuecomment-2106694300 From epeter at openjdk.org Mon May 13 05:48:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 May 2024 05:48:18 GMT Subject: RFR: 8331764: C2 SuperWord: refactor _align_to_ref/_mem_ref_for_main_loop_alignment In-Reply-To: References: Message-ID: On Wed, 8 May 2024 14:33:22 GMT, Vladimir Kozlov wrote: >> This PR accomplishes these things: >> - Rename `_align_to_ref` -> `_mem_ref_for_main_loop_alignment`. >> - Move the `mem_ref` finding for alignment out of `SuperWord::find_adjacent_refs`. This is too early, and we don't even know if the relevant `mem_ref` is going to be vectorized. It makes more sense to pick a `mem_ref` directly in `SuperWord::adjust_pre_loop_limit_to_align_main_loop_vectors`, where we already know what packs are going to be vectorized. >> - For the alignment width (aw), we can use the `vector_width` of the pack to which the `mem_ref` belongs, rather than the potentially much larger `vector_width_in_bytes`. I track this with `_aw_for_main_loop_alignment` now. >> >> I need this for https://github.com/openjdk/jdk/pull/18822, and decided to split it out into an independent change. > > Good. Thanks @vnkozlov @chhagedorn for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19115#issuecomment-2106700060 From epeter at openjdk.org Mon May 13 05:48:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 May 2024 05:48:18 GMT Subject: Integrated: 8331764: C2 SuperWord: refactor _align_to_ref/_mem_ref_for_main_loop_alignment In-Reply-To: References: Message-ID: <0gCo8BOJuWlOFZndYqNlwDzkqjSpsjNvN4wHpFFpzUU=.40169e88-d51e-444a-bcab-a52877acb526@github.com> On Tue, 7 May 2024 09:26:11 GMT, Emanuel Peter wrote: > This PR accomplishes these things: > - Rename `_align_to_ref` -> `_mem_ref_for_main_loop_alignment`. > - Move the `mem_ref` finding for alignment out of `SuperWord::find_adjacent_refs`. This is too early, and we don't even know if the relevant `mem_ref` is going to be vectorized. It makes more sense to pick a `mem_ref` directly in `SuperWord::adjust_pre_loop_limit_to_align_main_loop_vectors`, where we already know what packs are going to be vectorized. > - For the alignment width (aw), we can use the `vector_width` of the pack to which the `mem_ref` belongs, rather than the potentially much larger `vector_width_in_bytes`. I track this with `_aw_for_main_loop_alignment` now. > > I need this for https://github.com/openjdk/jdk/pull/18822, and decided to split it out into an independent change. This pull request has now been integrated. Changeset: d517d2df Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/d517d2df451e135583083ed3684d7d3241b36f76 Stats: 67 lines in 2 files changed: 41 ins; 20 del; 6 mod 8331764: C2 SuperWord: refactor _align_to_ref/_mem_ref_for_main_loop_alignment Reviewed-by: kvn, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/19115 From epeter at openjdk.org Mon May 13 06:01:33 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 May 2024 06:01:33 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v8] In-Reply-To: <8-_t7nWbR9gZ2_QkfFNuf5M0Q4PMkKJKgwS3ZbHcCxI=.32dc4f11-dec5-468d-afc8-3b4dae285dcb@github.com> References: <8-_t7nWbR9gZ2_QkfFNuf5M0Q4PMkKJKgwS3ZbHcCxI=.32dc4f11-dec5-468d-afc8-3b4dae285dcb@github.com> Message-ID: On Wed, 8 May 2024 20:22:51 GMT, Bhavana Kilambi wrote: > I am not sure if I fully understand what's expected in the JTREG tests. Should I be verifying the -XX:+PrintIdeal output to make sure the correct message is being printed for the ReductionV* nodes? Yes, the IR framework basically does regex matching against the PrintIdeal graph. For example: `counts = {IRNode.STORE_VECTOR, ">0"}` in the `@IR` rule executes the regex for the store vector, and checks if we find more than zero occurances. Maybe you can just use a regex string directly for your special IR rule. Alternatively, you could have them in the `IRNode` class, but not sure that's worth it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18034#issuecomment-2106712934 From epeter at openjdk.org Mon May 13 06:03:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 May 2024 06:03:36 GMT Subject: RFR: 8329273: C2 SuperWord: Some basic MemorySegment IR tests [v2] In-Reply-To: References: Message-ID: > I could not find any IR vectorization tests for `MemorySegment` yet. > > I make sure to exercise different backing types: > - arrays > - buffers > - native memory > > I filed a follow-up RFE, to eventually make all cases where I have "FAILS" vectorize: > > [JDK-8331659](https://bugs.openjdk.org/browse/JDK-8331659): C2 SuperWord: investicate failed vectorization in compiler/loopopts/superword/TestMemorySegment.java Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 25 additional commits since the last revision: - Merge branch 'master' into JDK-8329273-memory-segment-ir-tests - fix tabs - speed up test - small cosmetic fix - make things static - long loop tests - handle AlignVector - int cases - int-index case - disable mixed tests - ... and 15 more: https://git.openjdk.org/jdk/compare/43da3db1...6f760dfd ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18535/files - new: https://git.openjdk.org/jdk/pull/18535/files/b6f16a58..6f760dfd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18535&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18535&range=00-01 Stats: 43101 lines in 1987 files changed: 18450 ins; 16140 del; 8511 mod Patch: https://git.openjdk.org/jdk/pull/18535.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18535/head:pull/18535 PR: https://git.openjdk.org/jdk/pull/18535 From chagedorn at openjdk.org Mon May 13 06:48:17 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 13 May 2024 06:48:17 GMT Subject: RFR: 8329273: C2 SuperWord: Some basic MemorySegment IR tests [v2] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 06:03:36 GMT, Emanuel Peter wrote: >> I could not find any IR vectorization tests for `MemorySegment` yet. >> >> I make sure to exercise different backing types: >> - arrays >> - buffers >> - native memory >> >> I filed a follow-up RFE, to eventually make all cases where I have "FAILS" vectorize: >> >> [JDK-8331659](https://bugs.openjdk.org/browse/JDK-8331659): C2 SuperWord: investicate failed vectorization in compiler/loopopts/superword/TestMemorySegment.java > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 25 additional commits since the last revision: > > - Merge branch 'master' into JDK-8329273-memory-segment-ir-tests > - fix tabs > - speed up test > - small cosmetic fix > - make things static > - long loop tests > - handle AlignVector > - int cases > - int-index case > - disable mixed tests > - ... and 15 more: https://git.openjdk.org/jdk/compare/2faa8c83...6f760dfd Good basic tests! I have a few minor comments but otherwise, looks good. test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 36: > 34: /* > 35: * @test id=byte-array > 36: * @bug 8310190 Should be updated to 8329273. Same for other runs test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 166: > 164: String providerName = System.getProperty("memorySegmentProviderNameForTestVM"); > 165: provider = switch (providerName) { > 166: case "ByteArray" -> ( () -> { return newMemorySegmentOfByteArray(); } ); You can directly use an expression lambda without return: case "ByteArray" -> (() -> newMemorySegmentOfByteArray()); But I think you can go even further and directly use a method reference: Suggestion: case "ByteArray" -> (TestMemorySegmentImpl::newMemorySegmentOfByteArray); Same for others. test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 181: > 179: default -> throw new RuntimeException("Test argument not recognized: " + providerName); > 180: }; > 181: } As discussed offline, this is an interesting workaround. Maybe the IR framework could be extended at some point to simplify this. test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 187: > 185: > 186: // List of gold, the results from the first run before compilation > 187: Map golds = new HashMap(); You can replace these with `<>`: Suggestion: // List of tests Map tests = new HashMap<>(); // List of gold, the results from the first run before compilation Map golds = new HashMap<>(); test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 199: > 197: tests.put("testMemorySegmentBadExitCheck", () -> { > 198: return testMemorySegmentBadExitCheck(copy(a)); > 199: }); Same as above, you can replace this with an expression lambda: Suggestion: tests.put("testIntLoop_longIndex_intInvar_sameAdr_byte", () -> testIntLoop_longIndex_intInvar_sameAdr_byte(copy(a), 0)); Same for others. test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 347: > 345: > 346: static MemorySegment newMemorySegmentOfMixedBuffer() { > 347: switch(RANDOM.nextInt(2)) { Suggestion: switch (RANDOM.nextInt(2)) { test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 354: > 352: > 353: static MemorySegment newMemorySegmentOfMixed() { > 354: switch(RANDOM.nextInt(3)) { Suggestion: switch (RANDOM.nextInt(3)) { test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 447: > 445: @IR(counts = {IRNode.LOAD_VECTOR_B, "= 0", > 446: IRNode.ADD_VB, "= 0", > 447: IRNode.STORE_VECTOR, "= 0"}, You should use `failOn` instead of `= 0`. Same for other tests. ------------- PR Review: https://git.openjdk.org/jdk/pull/18535#pullrequestreview-2051802215 PR Review Comment: https://git.openjdk.org/jdk/pull/18535#discussion_r1597940804 PR Review Comment: https://git.openjdk.org/jdk/pull/18535#discussion_r1597946319 PR Review Comment: https://git.openjdk.org/jdk/pull/18535#discussion_r1597942075 PR Review Comment: https://git.openjdk.org/jdk/pull/18535#discussion_r1597947716 PR Review Comment: https://git.openjdk.org/jdk/pull/18535#discussion_r1597949915 PR Review Comment: https://git.openjdk.org/jdk/pull/18535#discussion_r1597950088 PR Review Comment: https://git.openjdk.org/jdk/pull/18535#discussion_r1597950155 PR Review Comment: https://git.openjdk.org/jdk/pull/18535#discussion_r1597942531 From duke at openjdk.org Mon May 13 07:08:15 2024 From: duke at openjdk.org (xiaotaonan) Date: Mon, 13 May 2024 07:08:15 GMT Subject: RFR: 8332032: C2: Remove ExpandSubTypeCheckAtParseTime flag In-Reply-To: <_VqdctFR4arGmdWQk9opoNMe6h1Rwa0gKDWHEcHyO9Y=.2ea3c35b-660d-4431-bc47-e0a874c386ce@github.com> References: <_VqdctFR4arGmdWQk9opoNMe6h1Rwa0gKDWHEcHyO9Y=.2ea3c35b-660d-4431-bc47-e0a874c386ce@github.com> Message-ID: On Mon, 13 May 2024 05:39:11 GMT, Christian Hagedorn wrote: > please first ask in JBS if you can take over RFEs/bugs that are already assigned like this one, especially if it has just been filed. This PR misses the entire context why this flag should be removed and what the pros/cons and trade-offs are. I planned to do some more offline discussions first before proposing the actual PR to remove this flag since it is now related to an otherwise hard-to-fix bug in Valhalla. OK. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19205#issuecomment-2106802957 From duke at openjdk.org Mon May 13 07:08:15 2024 From: duke at openjdk.org (xiaotaonan) Date: Mon, 13 May 2024 07:08:15 GMT Subject: Withdrawn: 8332032: C2: Remove ExpandSubTypeCheckAtParseTime flag In-Reply-To: References: Message-ID: On Sun, 12 May 2024 03:02:39 GMT, xiaotaonan wrote: > C2: Remove ExpandSubTypeCheckAtParseTime flag This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/19205 From epeter at openjdk.org Mon May 13 07:18:34 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 May 2024 07:18:34 GMT Subject: RFR: 8329273: C2 SuperWord: Some basic MemorySegment IR tests [v3] In-Reply-To: References: Message-ID: > I could not find any IR vectorization tests for `MemorySegment` yet. > > I make sure to exercise different backing types: > - arrays > - buffers > - native memory > > I filed a follow-up RFE, to eventually make all cases where I have "FAILS" vectorize: > > [JDK-8331659](https://bugs.openjdk.org/browse/JDK-8331659): C2 SuperWord: investicate failed vectorization in compiler/loopopts/superword/TestMemorySegment.java Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18535/files - new: https://git.openjdk.org/jdk/pull/18535/files/6f760dfd..3cbb4664 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18535&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18535&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18535.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18535/head:pull/18535 PR: https://git.openjdk.org/jdk/pull/18535 From epeter at openjdk.org Mon May 13 07:18:37 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 May 2024 07:18:37 GMT Subject: RFR: 8329273: C2 SuperWord: Some basic MemorySegment IR tests [v2] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 06:34:45 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 25 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8329273-memory-segment-ir-tests >> - fix tabs >> - speed up test >> - small cosmetic fix >> - make things static >> - long loop tests >> - handle AlignVector >> - int cases >> - int-index case >> - disable mixed tests >> - ... and 15 more: https://git.openjdk.org/jdk/compare/aa5b224f...6f760dfd > > test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 447: > >> 445: @IR(counts = {IRNode.LOAD_VECTOR_B, "= 0", >> 446: IRNode.ADD_VB, "= 0", >> 447: IRNode.STORE_VECTOR, "= 0"}, > > You should use `failOn` instead of `= 0`. Same for other tests. I honestly prefer "= 0", because it is easier to flip to "> 0", and keeps the same style that way. But I guess that is really a matter of taste. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18535#discussion_r1597982303 From epeter at openjdk.org Mon May 13 07:26:09 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 May 2024 07:26:09 GMT Subject: RFR: 8329273: C2 SuperWord: Some basic MemorySegment IR tests [v2] In-Reply-To: References: Message-ID: <49UAPFqTeTFEbRuJMW_pYQ8RJAKYj3DFYVIi8WHeMgI=.f7a067ef-878a-4875-9846-cb163403ba96@github.com> On Mon, 13 May 2024 06:32:45 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 25 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8329273-memory-segment-ir-tests >> - fix tabs >> - speed up test >> - small cosmetic fix >> - make things static >> - long loop tests >> - handle AlignVector >> - int cases >> - int-index case >> - disable mixed tests >> - ... and 15 more: https://git.openjdk.org/jdk/compare/7e77b898...6f760dfd > > test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 36: > >> 34: /* >> 35: * @test id=byte-array >> 36: * @bug 8310190 > > Should be updated to 8329273. Same for other runs Nice catch! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18535#discussion_r1597992380 From epeter at openjdk.org Mon May 13 07:30:10 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 May 2024 07:30:10 GMT Subject: RFR: 8329273: C2 SuperWord: Some basic MemorySegment IR tests [v2] In-Reply-To: References: Message-ID: <4WYvsoVX9v8WsQS8-74kMas53r2Bo-TVu2_TkmGWwTA=.a64336fb-238a-4f1c-98bc-83a8079ad5ea@github.com> On Mon, 13 May 2024 06:39:05 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 25 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8329273-memory-segment-ir-tests >> - fix tabs >> - speed up test >> - small cosmetic fix >> - make things static >> - long loop tests >> - handle AlignVector >> - int cases >> - int-index case >> - disable mixed tests >> - ... and 15 more: https://git.openjdk.org/jdk/compare/06854a6b...6f760dfd > > test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 166: > >> 164: String providerName = System.getProperty("memorySegmentProviderNameForTestVM"); >> 165: provider = switch (providerName) { >> 166: case "ByteArray" -> ( () -> { return newMemorySegmentOfByteArray(); } ); > > You can directly use an expression lambda without return: > > case "ByteArray" -> (() -> newMemorySegmentOfByteArray()); > > But I think you can go even further and directly use a method reference: > Suggestion: > > case "ByteArray" -> (TestMemorySegmentImpl::newMemorySegmentOfByteArray); > > Same for others. Oh, great idea! > test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 181: > >> 179: default -> throw new RuntimeException("Test argument not recognized: " + providerName); >> 180: }; >> 181: } > > As discussed offline, this is an interesting workaround. Maybe the IR framework could be extended at some point to simplify this. That would be nice! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18535#discussion_r1597996513 PR Review Comment: https://git.openjdk.org/jdk/pull/18535#discussion_r1597996770 From epeter at openjdk.org Mon May 13 07:38:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 May 2024 07:38:11 GMT Subject: RFR: 8329273: C2 SuperWord: Some basic MemorySegment IR tests [v2] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 06:42:42 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 25 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8329273-memory-segment-ir-tests >> - fix tabs >> - speed up test >> - small cosmetic fix >> - make things static >> - long loop tests >> - handle AlignVector >> - int cases >> - int-index case >> - disable mixed tests >> - ... and 15 more: https://git.openjdk.org/jdk/compare/7eaa6f7c...6f760dfd > > test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 199: > >> 197: tests.put("testMemorySegmentBadExitCheck", () -> { >> 198: return testMemorySegmentBadExitCheck(copy(a)); >> 199: }); > > Same as above, you can replace this with an expression lambda: > Suggestion: > > tests.put("testIntLoop_longIndex_intInvar_sameAdr_byte", > () -> testIntLoop_longIndex_intInvar_sameAdr_byte(copy(a), 0)); > > Same for others. Nice idea! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18535#discussion_r1598006814 From epeter at openjdk.org Mon May 13 07:47:35 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 May 2024 07:47:35 GMT Subject: RFR: 8329273: C2 SuperWord: Some basic MemorySegment IR tests [v4] In-Reply-To: References: Message-ID: > I could not find any IR vectorization tests for `MemorySegment` yet. > > I make sure to exercise different backing types: > - arrays > - buffers > - native memory > > I filed a follow-up RFE, to eventually make all cases where I have "FAILS" vectorize: > > [JDK-8331659](https://bugs.openjdk.org/browse/JDK-8331659): C2 SuperWord: investicate failed vectorization in compiler/loopopts/superword/TestMemorySegment.java Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'JDK-8329273-memory-segment-ir-tests' of https://github.com/eme64/jdk into JDK-8329273-memory-segment-ir-tests - review suggestions by Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18535/files - new: https://git.openjdk.org/jdk/pull/18535/files/3cbb4664..b6ddb4b7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18535&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18535&range=02-03 Stats: 101 lines in 1 file changed: 0 ins; 50 del; 51 mod Patch: https://git.openjdk.org/jdk/pull/18535.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18535/head:pull/18535 PR: https://git.openjdk.org/jdk/pull/18535 From asotona at openjdk.org Mon May 13 07:54:09 2024 From: asotona at openjdk.org (Adam Sotona) Date: Mon, 13 May 2024 07:54:09 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v8] In-Reply-To: <3eERzqYdCd4f9qn4KpzBA9ealaUTzC67wIhzB18ETTE=.f9d17a6f-1ca5-477f-8344-40c20abe7d7e@github.com> References: <3eERzqYdCd4f9qn4KpzBA9ealaUTzC67wIhzB18ETTE=.f9d17a6f-1ca5-477f-8344-40c20abe7d7e@github.com> Message-ID: <8bkIrXCl7OsuLoMQi43faVELq0d1R-P60pSCGkxpwpU=.fe207403-8288-4f2d-ab7d-96fec5ba212e@github.com> On Sun, 12 May 2024 15:11:17 GMT, Chen Liang wrote: >> Adam Sotona has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed tests > > src/java.base/share/classes/java/lang/classfile/Attributes.java line 153: > >> 151: >> 152: /** >> 153: * {@return Attribute mapper for the {@code AnnotationDefault} attribute} > > Just wondering, can we change `{@code AnnotationDefault}` to `{@value #NAME_ANNOTATION_DEFAULT}`, etc? This way, the names are still rendered as code in Javadoc HTML, but they are generated with links to the constants, and programmers will see these constants and prefer them over hardcoded values. On the other side it is questionable if the attribute names should be exposed in the API. We provide corresponding mappers and attribute models. I don't see a case where user would need to use the attribute names directly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19006#discussion_r1598026518 From chagedorn at openjdk.org Mon May 13 07:58:04 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 13 May 2024 07:58:04 GMT Subject: RFR: 8329273: C2 SuperWord: Some basic MemorySegment IR tests [v4] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 07:47:35 GMT, Emanuel Peter wrote: >> I could not find any IR vectorization tests for `MemorySegment` yet. >> >> I make sure to exercise different backing types: >> - arrays >> - buffers >> - native memory >> >> I filed a follow-up RFE, to eventually make all cases where I have "FAILS" vectorize: >> >> [JDK-8331659](https://bugs.openjdk.org/browse/JDK-8331659): C2 SuperWord: investicate failed vectorization in compiler/loopopts/superword/TestMemorySegment.java > > Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'JDK-8329273-memory-segment-ir-tests' of https://github.com/eme64/jdk into JDK-8329273-memory-segment-ir-tests > - review suggestions by Christian Updates look good, thanks! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18535#pullrequestreview-2051958085 From epeter at openjdk.org Mon May 13 08:03:03 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 May 2024 08:03:03 GMT Subject: RFR: 8325155: C2 SuperWord: remove alignment boundaries Message-ID: I have tried for a very long time to get rid of all the `alignment(n)` code that is all over the SuperWord code. With lots of previous work, I am now finally ready to remove it. I was able to remove lots of VM code, about 300 lines. And the removed code is I think much more complicated than the new code. This is what I did in this PR: - Removal of `_node_info`: used to have many fields, which I refactored out to the `VLoopAnalyzer` modules. `alignment` is the last component, which I now remove. - Changed the implementation of `SuperWord::find_adjacent_refs`, now `SuperWord::find_adjacent_memop_pairs`, completely: - It used to be an algorithm that would scan over all `memops` repeatedly, try to find some `mem_ref` and see which other memops were comparable, and then pack pairs for all of those, by comparing all-vs-all memops. This algorithm is at least quadratic, if not much worse. - I now add all `memops` into a single array, sort them by groups (those that are comparable with each other and could be packed into vectors), and inside the groups by ascending offset. This allows me to split off the groups much more efficiently, and also the sorting by offset allows me finding adjacent pairs much more efficiently. In the most cases this reduces the cost to `O(n log n)` for sort, and a linear scan for finding adjacent memops. - I removed the "alignment boundaries" created in `SuperWord::memory_alignment` by `int off_rem = offset % vw;`. - This used to have the effect that all offsets were computed modulo the vector width. Hence, pairs could not be packed across this boundary (e.g. we have nodes with offsets `31, 32`, which are adjacent in theory, but if we have a `vw = 32`, then the modulo-offsets are `31, 0`, and they are not detected as adjacent). - These "alignment boundaries" used to be required for correctness about a year ago, before I fixed and relaxed much of the alignment code. - The `alignment` used to have another important task: Ensuring compatibility of the input-size of a use node, with the output-size of the def-node. - This was done by giving all nodes an `alignment`, even the non-memop nodes. This `alignment` was then scaled up and down at type casts (e.g. int `0, 4, 8, 12` -> long `0, 8, 16, 24`). If the output-size of the def-node did not match the input-size of the use-node, then the `alignment` would not match up, and we would not pack. - This is why we used to have checks like `alignment(s1) + data_size(s1) == alignment(s2)` and `s2_align == align + data_size(s1)`, and why we did `set_alignment(s2, align + data_size(s1));` inside `SuperWord::set_alignment(Node* s1, Node* s2, int align)`. - I decided to NOT check if use/def type sizes match during packing, but only much later in `SuperWord::profitable` (bad name, it has always been more about checking consistency than profitability, but I will rename that in a Future RFE). The relevant code is in `SuperWord::is_velt_basic_type_compatible_use_def`. ------------- Commit messages: - rm TODO - manual merge - revert a line, need to fix it different - improve comments - fix alignment - fix reductions - MaxI reduction over chars - Merge branch 'master' into JDK-8325155-rm-alignment-boundaries - Merge branch 'master' into JDK-8325155-rm-alignment-boundaries - Merge branch 'master' into JDK-8325155-rm-alignment-boundaries - ... and 14 more: https://git.openjdk.org/jdk/compare/d517d2df...69396ac8 Changes: https://git.openjdk.org/jdk/pull/18822/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18822&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8325155 Stats: 1064 lines in 7 files changed: 597 ins; 369 del; 98 mod Patch: https://git.openjdk.org/jdk/pull/18822.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18822/head:pull/18822 PR: https://git.openjdk.org/jdk/pull/18822 From epeter at openjdk.org Mon May 13 08:03:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 May 2024 08:03:15 GMT Subject: RFR: 8325155: C2 SuperWord: remove alignment boundaries In-Reply-To: References: Message-ID: On Wed, 17 Apr 2024 17:58:53 GMT, Emanuel Peter wrote: > I have tried for a very long time to get rid of all the `alignment(n)` code that is all over the SuperWord code. With lots of previous work, I am now finally ready to remove it. > > I was able to remove lots of VM code, about 300 lines. And the removed code is I think much more complicated than the new code. > > This is what I did in this PR: > - Removal of `_node_info`: used to have many fields, which I refactored out to the `VLoopAnalyzer` modules. `alignment` is the last component, which I now remove. > - Changed the implementation of `SuperWord::find_adjacent_refs`, now `SuperWord::find_adjacent_memop_pairs`, completely: > - It used to be an algorithm that would scan over all `memops` repeatedly, try to find some `mem_ref` and see which other memops were comparable, and then pack pairs for all of those, by comparing all-vs-all memops. This algorithm is at least quadratic, if not much worse. > - I now add all `memops` into a single array, sort them by groups (those that are comparable with each other and could be packed into vectors), and inside the groups by ascending offset. This allows me to split off the groups much more efficiently, and also the sorting by offset allows me finding adjacent pairs much more efficiently. In the most cases this reduces the cost to `O(n log n)` for sort, and a linear scan for finding adjacent memops. > - I removed the "alignment boundaries" created in `SuperWord::memory_alignment` by `int off_rem = offset % vw;`. > - This used to have the effect that all offsets were computed modulo the vector width. Hence, pairs could not be packed across this boundary (e.g. we have nodes with offsets `31, 32`, which are adjacent in theory, but if we have a `vw = 32`, then the modulo-offsets are `31, 0`, and they are not detected as adjacent). > - These "alignment boundaries" used to be required for correctness about a year ago, before I fixed and relaxed much of the alignment code. > - The `alignment` used to have another important task: Ensuring compatibility of the input-size of a use node, with the output-size of the def-node. > - This was done by giving all nodes an `alignment`, even the non-memop nodes. This `alignment` was then scaled up and down at type casts (e.g. int `0, 4, 8, 12` -> long `0, 8, 16, 24`). If the output-size of the def-node did not match the input-size of the use-node, then the `alignment` would not match up, and we would not pack. > - This is why we used to have checks like `alignment(s1) + data_size(s1) == alignment(s2)` ... src/hotspot/share/opto/superword.cpp line 46: > 44: _vloop(vloop_analyzer.vloop()), > 45: _arena(mtCompiler), > 46: _node_info(arena(), _vloop.estimated_body_length(), 0, SWNodeInfo::initial), // info needed per node Note: held the "alignment" info, all other fields were already removed in previous refactorings. src/hotspot/share/opto/superword.cpp line 48: > 46: _clone_map(phase()->C->clone_map()), // map of nodes created in cloning > 47: _pairset(&_arena, _vloop_analyzer), > 48: _packset(&_arena, _vloop_analyzer Note: renamed it to `_mem_ref_for_main_loop_alignment` src/hotspot/share/opto/superword.cpp line 596: > 594: } > 595: } > 596: } Note: this used to count how many "comparable" VPointers we have for each memop. Goal: find memop with the most "comparable" VPointers, in the hope that it is the longest vector. src/hotspot/share/opto/superword.cpp line 675: > 673: > 674: //---------------------------get_vw_bytes_special------------------------ > 675: int SuperWord::get_vw_bytes_special(MemNode* s) { Note: computes "expected" vector width for the memop s. This is based on the `vector_width_in_bytes` but did some special logic for `MulAddS2I`. It also checks the `max_vector_size_in_def_use_chain`. This made sure that the vector width used was not too large, i.e. that there would not be a mismatch of this vector with for example inputs that would require a smaller or larger vector width. All of this seems now obsolete since the I introduced the `split_packs_at_use_def_boundaries` pass. Now, we can simply create the largest vector width that is ok for this memop, and if its use or defs later require a smaller vector width, we simply split this vetor/pack. src/hotspot/share/opto/superword.cpp line 694: > 692: if (!_pairset.is_left(s1) && !_pairset.is_right(s2)) { > 693: if (!s1->is_Mem() || are_adjacent_refs(s1, s2)) { > 694: return true; Note: we still check `are_adjacent_refs`, and non-memops don't need any alignment. src/hotspot/share/opto/superword.cpp line 705: > 703: //---------------------------get_iv_adjustment--------------------------- > 704: // Calculate loop's iv adjustment for this memory ops. > 705: int SuperWord::get_iv_adjustment(MemNode* mem_ref) { Note: was another helper method for `SuperWord::find_adjacent_refs`. Used as the input to `SuperWord::memory_alignment`. The value basically computes how many "elements" this `mem_ref` is away from the "alignment boundary" `offset % vw`. src/hotspot/share/opto/superword.cpp line 718: > 716: // several iterations are needed to align memory operations in main-loop even > 717: // if offset is 0. > 718: int iv_adjustment_in_bytes = (stride_sign * vw - (offset % vw)); Note: the `offset % vw` creates the "alignment boundaries", across which we could not pack any memops. src/hotspot/share/opto/superword.cpp line 921: > 919: continue; > 920: } > 921: if (can_pack_into_pair(t1, t2)) { Note: we now don't check if use/def are compatible with their types here, but in `is_velt_basic_type_compatible_use_def`. src/hotspot/share/opto/superword.cpp line 957: > 955: if (t2->Opcode() == Op_AddI && t2 == cl()->incr()) continue; // don't mess with the iv > 956: if (order_inputs_of_uses_to_match_def_pair(s1, s2, t1, t2) != PairOrderStatus::Ordered) { continue; } > 957: if (can_pack_into_pair(t1, t2)) { Note: we now don't check if use/def are compatible with their types here, but in is_velt_basic_type_compatible_use_def. src/hotspot/share/opto/superword.cpp line 1072: > 1070: if (longer_type_for_conversion(s) != T_ILLEGAL || > 1071: longer_type_for_conversion(t) != T_ILLEGAL) { > 1072: align = align / data_size(s) * data_size(t); Note: this check was there to ensure the type size of use/def nodes matches. This is now done by `is_velt_basic_type_compatible_use_def`. src/hotspot/share/opto/superword.cpp line 1611: > 1609: // the implementation in backend, superword splits the vector implementation > 1610: // for Java API into an execution node with long type plus another node > 1611: // converting long to int. Note: copied this comment from the use-site. This one is important, and I need it inside `is_velt_basic_type_compatible_use_def`. src/hotspot/share/opto/superword.cpp line 2755: > 2753: #endif > 2754: return true; > 2755: } Note: compatibility with `def` used to be checked via alignment, but now we need to check via `is_velt_basic_type_compatible_use_def`. For reductions, we only check the "second" input. src/hotspot/share/opto/superword.cpp line 2785: > 2783: if (!is_velt_basic_type_compatible_use_def(use, u_idx)) { > 2784: return false; > 2785: } Note: this check takes over all the use/def checks that I deleted below. src/hotspot/share/opto/superword.cpp line 2988: > 2986: Node* di = d_pk->at(i); > 2987: if (alignment(ui) != alignment(di) * 2) { > 2988: return false; Note: special case was required for MulAddS2I. src/hotspot/share/opto/superword.cpp line 3007: > 3005: } > 3006: if (alignment(ui) / type2aelembytes(velt_basic_type(ui)) != > 3007: alignment(di) / type2aelembytes(velt_basic_type(di))) { Note: we scaled the alignment by the element size. This allows us the transitions when doing type conversion, i.e. from 4 bytes to 8 bytes. src/hotspot/share/opto/superword.cpp line 3180: > 3178: } > 3179: > 3180: int SuperWord::max_vector_size_in_def_use_chain(Node* n) { Note: was used by `get_vw_bytes_special`. It looks at inputs and outputs of the node `n`, and looks for the largest basic type via `longer_type_for_conversion`. It then returned the max vector size (i.e. number of elements) for that basic type. We can fit fewer large elements in a vector. If we have small elements, we would like to have many elements in a vector. But we must make sure that use and def vectors can have at least as many elements. After I had recently introduced `split_packs_at_use_def_boundaries`, this special logic here is no longer necessary. src/hotspot/share/opto/superword.cpp line 3313: > 3311: //------------------------------memory_alignment--------------------------- > 3312: // Alignment within a vector memory reference > 3313: int SuperWord::memory_alignment(MemNode* s, int iv_adjust) { Note: used to "normalize" the offsets, such that they fit inside a vector. Example: offsets `1000, 1004, 1008, 1012` would be "adjusted" by `1000`, so that it is `0, 4, 8, 12`, and fits in a vector with `16` bytes. If we had `16` byte vectors, and 8 such offsets: `1000, 1004, 1008, 1012, 1016, 1020, 1024, 1028`, this would be split by the modulo `offset % vw` into two sets of `0, 4, 8, 12`, hence, both packs of 4 memops would have these "normalized" offsets. My new approach is just to avoid having the "normalized" offsets all together, and simply work from the "raw" offsets that the VPointer gives us. This is sufficient to determine adjacency. src/hotspot/share/opto/superword.cpp line 3326: > 3324: // We chose an aw that is the maximal possible vector width for the type of > 3325: // align_to_ref. > 3326: const int aw = MAX2(ObjectAlignmentInBytes, vector_width_in_bytes(align_to_ref)); Note: TODO see if we can file a separate bug. src/hotspot/share/opto/superword.cpp line 3331: > 3329: int offset = p.offset_in_bytes(); > 3330: offset += iv_adjust*p.memory_size(); > 3331: int off_rem = offset % vw; Note: this created the "alignment boundaries", by not letting any memops be packed past the vw boundary. src/hotspot/share/opto/superword.hpp line 393: > 391: class SWNodeInfo { > 392: public: > 393: int _alignment; // memory alignment for a node Note: `_alignment` is the last component left of the `SWNodeInfo`, we had already refactored away all other components and moved most of them to the `VLoopAnalylzer` submodules. src/hotspot/share/opto/superword.hpp line 404: > 402: > 403: // Memory reference for which we align the main-loop, by adjusting the pre-loop limit. > 404: MemNode const* _mem_ref_for_main_loop_alignment; Note: replacement for `_align_to_ref` src/hotspot/share/opto/superword.hpp line 512: > 510: // Too verbose for TraceSuperWord > 511: return _vloop.vtrace().is_trace(TraceAutoVectorizationTag::SW_ALIGNMENT); > 512: } Note: All the old verbose tracing is now removed. I now only use `is_trace_superword_adjacent_memops`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1590893258 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1590893657 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1597920375 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1597927247 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1590903980 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1597934968 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1597935680 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1590904863 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1590905149 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1590902729 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1590905995 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1584747015 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1590906802 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1597938750 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1597938336 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1597946835 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1597952454 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1590908823 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1590907960 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1597953541 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1590909210 PR Review Comment: https://git.openjdk.org/jdk/pull/18822#discussion_r1590909954 From galder at openjdk.org Mon May 13 08:15:14 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 13 May 2024 08:15:14 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v15] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 10:04:10 GMT, Dean Long wrote: >> Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix assert to only have a single ! > > src/hotspot/share/c1/c1_GraphBuilder.cpp line 2031: > >> 2029: ciType* type = receiver->exact_type(); >> 2030: if (type != nullptr && type->is_loaded()) { >> 2031: assert(!type->as_instance_klass()->is_interface(), ""); > > Suggestion: > > assert(!type->is_instance_klass() || !type->as_instance_klass()->is_interface(), ""); Thanks @dean-long for the suggested fix. CI looks good now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1598054058 From mli at openjdk.org Mon May 13 08:19:33 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 13 May 2024 08:19:33 GMT Subject: RFR: 8332130: RISC-V: remove wrong instructions of Vector Crypto Extension Message-ID: Hi, Can you help to reivew this simple patch to remove some wrong instrunctions on riscv? Thanks ------------- Commit messages: - Initial commit Changes: https://git.openjdk.org/jdk/pull/19211/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19211&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332130 Stats: 5 lines in 1 file changed: 0 ins; 5 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19211.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19211/head:pull/19211 PR: https://git.openjdk.org/jdk/pull/19211 From luhenry at openjdk.org Mon May 13 08:43:06 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 13 May 2024 08:43:06 GMT Subject: RFR: 8332130: RISC-V: remove wrong instructions of Vector Crypto Extension In-Reply-To: References: Message-ID: On Mon, 13 May 2024 08:14:43 GMT, Hamlin Li wrote: > Hi, > Can you help to reivew this simple patch to remove some wrong instrunctions on riscv? > Thanks What do you mean by wrong? Happy to remove them but we should give some more context. ------------- Marked as reviewed by luhenry (Committer). PR Review: https://git.openjdk.org/jdk/pull/19211#pullrequestreview-2052055354 From tholenstein at openjdk.org Mon May 13 08:46:04 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 13 May 2024 08:46:04 GMT Subject: RFR: 8330584: IGV: XML does not save all node properties [v2] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 12:10:23 GMT, Tobias Holenstein wrote: >> When C2 sends graphs over the network to IGV, each graph is sent separately. The same applies if C2 saves graphs to XML: each graph is saved with all it's nodes as a separate `...` in the XML >> >> To save space, graphs that are saved from IGV only contains the incremental difference for each graph. This saves a lot of space (~5-10x). The logic happens in Printer.java -> `exportInputGraph(.., difference=true, ...)` Unfortunately, there is a bug in this logic: the properties of the nodes are not saved correctly. >> >> [graphs.zip](https://github.com/openjdk/jdk/files/15220940/graphs.zip) contains 4 graphs: >> >> `graph_c2.xml` (230KB) - a XML saved from C2 >> `graph_igv_bug.xml` (73KB) - opened `graph_c2.xml` in IGV (without this fix) and save as `graph_igv_bug.xml`. >> `graph_igv_fixed.xml` (123KB) - opened `graph_c2.xml` in IGV (with this fix) and save as `graph_igv_fixed.xml `. >> >> As you can see `graph_igv_fixed.xml` is twice as large as `graph_igv_bug.xml` because it contains the missing properties. But now the memory saving from the original `graph_c2.xml` is only ~2x. >> Therefore a new format for saving is added: graphs can now be saved and opened from IGV as `.igv`. This uses a compressed (ZIP) format. >> >> `graph.igv` (10KB) is the same graph as `graph_c2.xml` (230KB). But it uses difference graph compression and ZIP compression and is in total 23x smaller in memory footprint. >> >> >> >> E.g. The root in the last graph of difference_true.xml has way less properties than in difference_false.xml. > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > Update src/utils/IdealGraphVisualizer/Coordinator/src/main/java/com/sun/hotspot/igv/coordinator/OutlineTopComponent.java > > Co-authored-by: Roberto Casta?eda Lozano > > Just a general thought: Should we generally only save in .igv format and drop (explicit) saving in XML format or is there any benefit to be able to store in both formats? > > I find the explicit XML format convenient sometimes for debugging something or doing a quick plain-text search. > > Just a general thought: Should we generally only save in .igv format and drop (explicit) saving in XML format or is there any benefit to be able to store in both formats? > > I find the explicit XML format convenient sometimes for debugging something or doing a quick plain-text search. I don't mind keeping both formats. As a side note: unzip graph.igv gives you `difference.xml` as well ------------- PR Comment: https://git.openjdk.org/jdk/pull/19104#issuecomment-2106991391 From gli at openjdk.org Mon May 13 09:03:16 2024 From: gli at openjdk.org (Guoxiong Li) Date: Mon, 13 May 2024 09:03:16 GMT Subject: RFR: 8332032: C2: Remove ExpandSubTypeCheckAtParseTime flag In-Reply-To: References: Message-ID: On Mon, 13 May 2024 00:59:18 GMT, xiaotaonan wrote: > @lgxbslgx The reviewers (maybe me) of the corresponding area will review your patch. So I don't think you need to CC me especially. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19205#issuecomment-2107026125 From chagedorn at openjdk.org Mon May 13 09:18:15 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 13 May 2024 09:18:15 GMT Subject: RFR: 8332032: C2: Remove ExpandSubTypeCheckAtParseTime flag In-Reply-To: References: <_VqdctFR4arGmdWQk9opoNMe6h1Rwa0gKDWHEcHyO9Y=.2ea3c35b-660d-4431-bc47-e0a874c386ce@github.com> Message-ID: On Mon, 13 May 2024 07:03:44 GMT, xiaotaonan wrote: > > please first ask in JBS if you can take over RFEs/bugs that are already assigned like this one, especially if it has just been filed. This PR misses the entire context why this flag should be removed and what the pros/cons and trade-offs are. I planned to do some more offline discussions first before proposing the actual PR to remove this flag since it is now related to an otherwise hard-to-fix bug in Valhalla. > > OK. Thanks for your understanding and letting me taking this PR over - I will propose this change again later this week (we first also need to update some internal stress jobs that use this flag). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19205#issuecomment-2107059329 From tholenstein at openjdk.org Mon May 13 09:18:10 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 13 May 2024 09:18:10 GMT Subject: RFR: 8330584: IGV: XML does not save all node properties [v2] In-Reply-To: References: Message-ID: On Fri, 10 May 2024 09:37:44 GMT, Christian Hagedorn wrote: >> Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/utils/IdealGraphVisualizer/Coordinator/src/main/java/com/sun/hotspot/igv/coordinator/OutlineTopComponent.java >> >> Co-authored-by: Roberto Casta?eda Lozano > > Just a general thought: Should we generally only save in `.igv` format and drop (explicit) saving in XML format or is there any benefit to be able to store in both formats? thanks for the reviews @chhagedorn and @robcasloz ------------- PR Comment: https://git.openjdk.org/jdk/pull/19104#issuecomment-2107054910 From tholenstein at openjdk.org Mon May 13 09:18:14 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 13 May 2024 09:18:14 GMT Subject: Integrated: 8330584: IGV: XML does not save all node properties In-Reply-To: References: Message-ID: On Mon, 6 May 2024 12:06:20 GMT, Tobias Holenstein wrote: > When C2 sends graphs over the network to IGV, each graph is sent separately. The same applies if C2 saves graphs to XML: each graph is saved with all it's nodes as a separate `...` in the XML > > To save space, graphs that are saved from IGV only contains the incremental difference for each graph. This saves a lot of space (~5-10x). The logic happens in Printer.java -> `exportInputGraph(.., difference=true, ...)` Unfortunately, there is a bug in this logic: the properties of the nodes are not saved correctly. > > [graphs.zip](https://github.com/openjdk/jdk/files/15220940/graphs.zip) contains 4 graphs: > > `graph_c2.xml` (230KB) - a XML saved from C2 > `graph_igv_bug.xml` (73KB) - opened `graph_c2.xml` in IGV (without this fix) and save as `graph_igv_bug.xml`. > `graph_igv_fixed.xml` (123KB) - opened `graph_c2.xml` in IGV (with this fix) and save as `graph_igv_fixed.xml `. > > As you can see `graph_igv_fixed.xml` is twice as large as `graph_igv_bug.xml` because it contains the missing properties. But now the memory saving from the original `graph_c2.xml` is only ~2x. > Therefore a new format for saving is added: graphs can now be saved and opened from IGV as `.igv`. This uses a compressed (ZIP) format. > > `graph.igv` (10KB) is the same graph as `graph_c2.xml` (230KB). But it uses difference graph compression and ZIP compression and is in total 23x smaller in memory footprint. > > > > E.g. The root in the last graph of difference_true.xml has way less properties than in difference_false.xml. This pull request has now been integrated. Changeset: 391bbbc7 Author: Tobias Holenstein URL: https://git.openjdk.org/jdk/commit/391bbbc7d0fb95b0cd55e2f56c43bee019aeab7f Stats: 147 lines in 3 files changed: 79 ins; 16 del; 52 mod 8330584: IGV: XML does not save all node properties Reviewed-by: rcastanedalo, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/19104 From mli at openjdk.org Mon May 13 09:51:11 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 13 May 2024 09:51:11 GMT Subject: RFR: 8332130: RISC-V: remove wrong instructions of Vector Crypto Extension In-Reply-To: References: Message-ID: On Mon, 13 May 2024 08:40:08 GMT, Ludovic Henry wrote: > What do you mean by wrong? Happy to remove them but we should give some more context. Thanks, update the pr desc to explain why they're wrong. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19211#issuecomment-2107128837 From bkilambi at openjdk.org Mon May 13 10:27:14 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 13 May 2024 10:27:14 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v8] In-Reply-To: References: <8-_t7nWbR9gZ2_QkfFNuf5M0Q4PMkKJKgwS3ZbHcCxI=.32dc4f11-dec5-468d-afc8-3b4dae285dcb@github.com> Message-ID: <2y-Ag6MxVDJfYl6kM0FYjQA-kzSCekUgAMWAZmkECyQ=.2a2a0a8e-fc67-42a4-bd67-b4ae3b60bcea@github.com> On Mon, 13 May 2024 05:58:17 GMT, Emanuel Peter wrote: >>> I just realized that there is no regression test. And I think it would be nice to have one. >>> >>> Also, we should add some sort of message to the `dump` if the `ReductionNode` has the `requires_strict_order` on or off. I think that could be done in `dump_spec`. >>> >>> You could do it similar to: >>> >>> ``` >>> #ifndef PRODUCT >>> void VectorMaskCmpNode::dump_spec(outputStream *st) const { >>> st->print(" %d #", _predicate); _type->dump_on(st); >>> } >>> #endif // PRODUCT >>> ``` >>> >>> This would actually allow you to create a IR test! >>> >>> You would check that the AddReductionVNode is annotated correctly. You need some VectorAPI tests, and some SuperWord auto-vectorization tests. >>> >>> How does that sound? That would ensure that nobody can easily destroy your RFE, at least not in the IR. >> >> Hi @eme64 , thanks for the suggestion. I can add the `dump_spec` as suggested (which would print if the `_requires_strict_order` flag is enabled/disabled) but I am not sure if I fully understand what's expected in the JTREG tests. Should I be verifying the `-XX:+PrintIdeal` output to make sure the correct message is being printed for the `ReductionV*` nodes? > >> I am not sure if I fully understand what's expected in the JTREG tests. Should I be verifying the -XX:+PrintIdeal output to make sure the correct message is being printed for the ReductionV* nodes? > > Yes, the IR framework basically does regex matching against the PrintIdeal graph. For example: `counts = {IRNode.STORE_VECTOR, ">0"}` in the `@IR` rule executes the regex for the store vector, and checks if we find more than zero occurances. > > Maybe you can just use a regex string directly for your special IR rule. Alternatively, you could have them in the `IRNode` class, but not sure that's worth it. @eme64 Thanks for the clarification. I understand the usage of `counts` in the IR tests. Just that I got a bit confused by some of your earlier statements. We do actually have a test to make sure AddReductionVF/VD and MulReductionVF/VD are not generated on aarch64 NEON machines - `test/hotspot/jtreg/compiler/c2/irTests/TestDisableAutoVectOpcodes.java`. I can modify this test to include UseSVE > 0 case as well and will also add a separate JTREG test for the VectorAPI tests. Hope that's ok.. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18034#issuecomment-2107199006 From yzheng at openjdk.org Mon May 13 10:32:51 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 13 May 2024 10:32:51 GMT Subject: RFR: 8331429: [JVMCI] Cleanup JVMCIRuntime allocation routines [v2] In-Reply-To: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> References: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> Message-ID: > This PR removes allocation routines that may throw exception from JVMCIRuntime. It also exports various symbols related to the hashed secondary supers table. Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: address comment. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19176/files - new: https://git.openjdk.org/jdk/pull/19176/files/2c688ece..82f0e0d0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19176&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19176&range=00-01 Stats: 19 lines in 3 files changed: 2 ins; 6 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/19176.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19176/head:pull/19176 PR: https://git.openjdk.org/jdk/pull/19176 From yzheng at openjdk.org Mon May 13 10:32:51 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 13 May 2024 10:32:51 GMT Subject: RFR: 8331429: [JVMCI] Cleanup JVMCIRuntime allocation routines [v2] In-Reply-To: References: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> <9wsX9310p38cnuPHGU4xKirWfyfYR6cICO6iPhnDk5Y=.55d9503f-2cc8-4c26-b24f-2ced7f8f72f5@github.com> Message-ID: On Sat, 11 May 2024 09:06:20 GMT, Doug Simon wrote: >> Only for the HAS_PENDING_EXCEPTION case. What about the !h->is_initialized() case? > > Good observation - seems like this is an outstanding bug. Can you please address that Yudi. > In practice, I wonder how much this matters as Graal always [clears the object result](https://github.com/oracle/graal/blob/0b61d20b08b1af76bd35cfb673c7be8d33855f51/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/hotspot/stubs/ForeignCallSnippets.java#L127) after reading it. Good point ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19176#discussion_r1598245624 From epeter at openjdk.org Mon May 13 11:04:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 May 2024 11:04:18 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v8] In-Reply-To: <2y-Ag6MxVDJfYl6kM0FYjQA-kzSCekUgAMWAZmkECyQ=.2a2a0a8e-fc67-42a4-bd67-b4ae3b60bcea@github.com> References: <8-_t7nWbR9gZ2_QkfFNuf5M0Q4PMkKJKgwS3ZbHcCxI=.32dc4f11-dec5-468d-afc8-3b4dae285dcb@github.com> <2y-Ag6MxVDJfYl6kM0FYjQA-kzSCekUgAMWAZmkECyQ=.2a2a0a8e-fc67-42a4-bd67-b4ae3b60bcea@github.com> Message-ID: On Mon, 13 May 2024 10:22:12 GMT, Bhavana Kilambi wrote: >>> I am not sure if I fully understand what's expected in the JTREG tests. Should I be verifying the -XX:+PrintIdeal output to make sure the correct message is being printed for the ReductionV* nodes? >> >> Yes, the IR framework basically does regex matching against the PrintIdeal graph. For example: `counts = {IRNode.STORE_VECTOR, ">0"}` in the `@IR` rule executes the regex for the store vector, and checks if we find more than zero occurances. >> >> Maybe you can just use a regex string directly for your special IR rule. Alternatively, you could have them in the `IRNode` class, but not sure that's worth it. > > @eme64 Thanks for the clarification. I understand the usage of `counts` in the IR tests. Just that I got a bit confused by some of your earlier statements. We do actually have a test to make sure AddReductionVF/VD and MulReductionVF/VD are not generated on aarch64 NEON machines - `test/hotspot/jtreg/compiler/c2/irTests/TestDisableAutoVectOpcodes.java`. I can modify this test to include UseSVE > 0 case as well and will also add a separate JTREG test for the VectorAPI tests. Hope that's ok.. @Bhavana-Kilambi I know we have the tests in `test/hotspot/jtreg/compiler/c2/irTests/TestDisableAutoVectOpcodes.java`, and some other reduction tests. But these do not do the specific think I would like to see. I would like this: - Add `no_strict_order` vs `requires_strict_order` or similar to `dump_spec`. - IR match not just that there is the correct `ReductionNode`, but also that it has the `no_strict_order` or `requires_strict_order` in its dump. You can do that by using a custom regex string, rather than `IRNode.STORE_VECTOR` or similar. - Then, create different tests, some where we expect ordered, some unordered vectors. Use Vector API and SuperWord examples. Does that make sense? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18034#issuecomment-2107276021 From yzheng at openjdk.org Mon May 13 11:34:18 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 13 May 2024 11:34:18 GMT Subject: RFR: 8331429: [JVMCI] Cleanup JVMCIRuntime allocation routines [v3] In-Reply-To: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> References: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> Message-ID: > This PR removes allocation routines that may throw exception from JVMCIRuntime. It also exports various symbols related to the hashed secondary supers table. Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: remove trailing white space ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19176/files - new: https://git.openjdk.org/jdk/pull/19176/files/82f0e0d0..0a638521 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19176&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19176&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19176.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19176/head:pull/19176 PR: https://git.openjdk.org/jdk/pull/19176 From bkilambi at openjdk.org Mon May 13 12:10:10 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 13 May 2024 12:10:10 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v8] In-Reply-To: <2y-Ag6MxVDJfYl6kM0FYjQA-kzSCekUgAMWAZmkECyQ=.2a2a0a8e-fc67-42a4-bd67-b4ae3b60bcea@github.com> References: <8-_t7nWbR9gZ2_QkfFNuf5M0Q4PMkKJKgwS3ZbHcCxI=.32dc4f11-dec5-468d-afc8-3b4dae285dcb@github.com> <2y-Ag6MxVDJfYl6kM0FYjQA-kzSCekUgAMWAZmkECyQ=.2a2a0a8e-fc67-42a4-bd67-b4ae3b60bcea@github.com> Message-ID: <1G5vZYJlb_DYSjClQiGKulCfT-lk5wi3GXkoy1mBSh0=.f7004c63-b303-42bb-8104-40929931f4d6@github.com> On Mon, 13 May 2024 10:22:12 GMT, Bhavana Kilambi wrote: >>> I am not sure if I fully understand what's expected in the JTREG tests. Should I be verifying the -XX:+PrintIdeal output to make sure the correct message is being printed for the ReductionV* nodes? >> >> Yes, the IR framework basically does regex matching against the PrintIdeal graph. For example: `counts = {IRNode.STORE_VECTOR, ">0"}` in the `@IR` rule executes the regex for the store vector, and checks if we find more than zero occurances. >> >> Maybe you can just use a regex string directly for your special IR rule. Alternatively, you could have them in the `IRNode` class, but not sure that's worth it. > > @eme64 Thanks for the clarification. I understand the usage of `counts` in the IR tests. Just that I got a bit confused by some of your earlier statements. We do actually have a test to make sure AddReductionVF/VD and MulReductionVF/VD are not generated on aarch64 NEON machines - `test/hotspot/jtreg/compiler/c2/irTests/TestDisableAutoVectOpcodes.java`. I can modify this test to include UseSVE > 0 case as well and will also add a separate JTREG test for the VectorAPI tests. Hope that's ok.. > @Bhavana-Kilambi I know we have the tests in `test/hotspot/jtreg/compiler/c2/irTests/TestDisableAutoVectOpcodes.java`, and some other reduction tests. But these do not do the specific think I would like to see. > > I would like this: > > * Add `no_strict_order` vs `requires_strict_order` or similar to `dump_spec`. > > * IR match not just that there is the correct `ReductionNode`, but also that it has the `no_strict_order` or `requires_strict_order` in its dump. You can do that by using a custom regex string, rather than `IRNode.STORE_VECTOR` or similar. > > * Then, create different tests, some where we expect ordered, some unordered vectors. Use Vector API and SuperWord examples. > > > Does that make sense? Yes, I am doing exactly that. Just that for the superword(auto-vec) case, I am just modifying the AddReduction related tests in `TestDisableAutoVectOpcodes.java` to incorporate the case with UseSVE > 0 as well and match the regex as per the dump_spec output. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18034#issuecomment-2107401404 From liach at openjdk.org Mon May 13 12:15:10 2024 From: liach at openjdk.org (Chen Liang) Date: Mon, 13 May 2024 12:15:10 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v8] In-Reply-To: <8bkIrXCl7OsuLoMQi43faVELq0d1R-P60pSCGkxpwpU=.fe207403-8288-4f2d-ab7d-96fec5ba212e@github.com> References: <3eERzqYdCd4f9qn4KpzBA9ealaUTzC67wIhzB18ETTE=.f9d17a6f-1ca5-477f-8344-40c20abe7d7e@github.com> <8bkIrXCl7OsuLoMQi43faVELq0d1R-P60pSCGkxpwpU=.fe207403-8288-4f2d-ab7d-96fec5ba212e@github.com> Message-ID: On Mon, 13 May 2024 07:51:19 GMT, Adam Sotona wrote: >> src/java.base/share/classes/java/lang/classfile/Attributes.java line 153: >> >>> 151: >>> 152: /** >>> 153: * {@return Attribute mapper for the {@code AnnotationDefault} attribute} >> >> Just wondering, can we change `{@code AnnotationDefault}` to `{@value #NAME_ANNOTATION_DEFAULT}`, etc? This way, the names are still rendered as code in Javadoc HTML, but they are generated with links to the constants, and programmers will see these constants and prefer them over hardcoded values. > > On the other side it is questionable if the attribute names should be exposed in the API. We provide corresponding mappers and attribute models. I don't see a case where user would need to use the attribute names directly. Makes sense, we can always add these literals back if we do need them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19006#discussion_r1598368707 From eastigeevich at openjdk.org Mon May 13 13:11:15 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 13 May 2024 13:11:15 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives Message-ID: Backout of [JDK-8309271](https://bugs.openjdk.org/browse/JDK-8309271) which has known bugs, possible bugs and performance issues. Found bugs: - When refreshing `CompilerDirectivesAddDCmd::execute` will call `DirectivesStack::hasMatchingDirectives(mh, true)` which only considers the compiler directive which is on the top of the directives stack. As more than one directive can be added, `CompilerDirectivesAddDCmd::execute` will not behave as expected. - A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. There are other concerns: bugs and performance issues. Possible bugs: - `has_matching_directives` might not be cleared. A nmethod might get into the unloading state before `CodeCache::recompile_marked_directives_matches`. If the nmethod has been used to mark a Java method and it is the only nmethod, there will be no nmethod in CodeCache to reach the Java method to clear the mark. - A Java method might have been compiled with new directives before `CodeCache::recompile_marked_directives_matches`. `CodeCache::recompile_marked_directives_matches` will recompile it again. - JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. Performance issues: - Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, `CodeCache::recompile_marked_directives_matches` will be traversing nmethods most of which don't need recompilation. The backout is not clean because of removal of `CompiledMethod`. Tested with release and fastdebug builds: tier1 and tier2 passed. ------------- Commit messages: - 8332111: [BACKOUT] A way to align already compiled methods with compiler directives Changes: https://git.openjdk.org/jdk/pull/19215/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19215&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332111 Stats: 380 lines in 15 files changed: 3 ins; 347 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/19215.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19215/head:pull/19215 PR: https://git.openjdk.org/jdk/pull/19215 From shade at openjdk.org Mon May 13 13:21:05 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 13 May 2024 13:21:05 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: <_Hf9ur_fzBA6MysoCZHn7KAjJwC0ubP8v4SKBvethOw=.63d58c21-c8ef-4b5a-b878-7fd330e0d654@github.com> On Mon, 13 May 2024 13:03:26 GMT, Evgeny Astigeevich wrote: > Backout of [JDK-8309271](https://bugs.openjdk.org/browse/JDK-8309271) which has known bugs, possible bugs and performance issues. > > Found bugs: > - When refreshing `CompilerDirectivesAddDCmd::execute` will call `DirectivesStack::hasMatchingDirectives(mh, true)` which only considers the compiler directive which is on the top of the directives stack. As more than one directive can be added, `CompilerDirectivesAddDCmd::execute` will not behave as expected. > - A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. > > There are other concerns: bugs and performance issues. > > Possible bugs: > - `has_matching_directives` might not be cleared. A nmethod might get into the unloading state before `CodeCache::recompile_marked_directives_matches`. If the nmethod has been used to mark a Java method and it is the only nmethod, there will be no nmethod in CodeCache to reach the Java method to clear the mark. > - A Java method might have been compiled with new directives before `CodeCache::recompile_marked_directives_matches`. `CodeCache::recompile_marked_directives_matches` will recompile it again. > - JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. > > Performance issues: > - Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, `CodeCache::recompile_marked_directives_matches` will be traversing nmethods most of which don't need recompilation. > > The backout is not clean because of removal of `CompiledMethod`. > > Tested with release and fastdebug builds: tier1 and tier2 passed. The reversal looks fine. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19215#pullrequestreview-2052683089 From roland at openjdk.org Mon May 13 13:23:46 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 May 2024 13:23:46 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop [v2] In-Reply-To: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> Message-ID: <_kbcMydcMPblcm_FDDuL5vWGT7q6iRoarmYsTlEA0hQ=.290c6744-211d-406d-8ed1-90e510051167@github.com> > In the test case: > > > long i; > for (; i > 0; i--) { > res += 42 / ((int) i); > > > The long counted loop phi has type `[1..100]`. As a consequence, the > `ConvL2I` also has type `[1..100]`. The `DivI` node that follows can't > fault: it is not guarded by a zero check and has no control set. > > The `ConvL2I` is split through phi and so is the `DiVI` node: > `PhaseIdealLoop::cannot_split_division()` returns true because the > value coming from the backedge into the `DivI` (when it is about to be > split thru phi) is the result of the `ConvL2I` which has type > `[1..100`] so is not zero as far as the compiler can tell. > > On the last iteration of the loop, i is 1. Because the DivI was split > thru Phi, it computes the value for the following iteration, so for i > = 0. This causes a crash when the compiled code runs. > > The same problem can't happen with an int counted loop because logic > in `PhaseIdealLoop::split_thru_phi()` prevents a `ConvI2L` from being > split thru phi. I propose to fix this the same way: in the test case, > it's not true that once the `ConvL2I` is split thru phi it keeps type > `[1..100]`. The fix is fairly conservative because it's base on the > existing logic for `ConvI2L`: we would want to not split a `ConvL2I` > only a counted loopd but. I suppose the same is true for the `ConvI2L` > and I thought it would be best to revisit both together. Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: - test case tweaks - fuzzer test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19086/files - new: https://git.openjdk.org/jdk/pull/19086/files/d48443c3..3c417dc2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19086&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19086&range=00-01 Stats: 63 lines in 2 files changed: 61 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19086.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19086/head:pull/19086 PR: https://git.openjdk.org/jdk/pull/19086 From roland at openjdk.org Mon May 13 13:23:46 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 May 2024 13:23:46 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop [v2] In-Reply-To: References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> Message-ID: On Mon, 6 May 2024 07:35:56 GMT, Christian Hagedorn wrote: > You could also add the regression tests from the duplicated issue [JDK-8298851](https://bugs.openjdk.org/browse/JDK-8298851). I added one of them because it doesn't seem to need `StressGCM`. Does it really make sense to add all of them? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19086#issuecomment-2107563511 From roland at openjdk.org Mon May 13 13:23:46 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 May 2024 13:23:46 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop [v2] In-Reply-To: References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> <2lreoMy7UKtgM_m8RCU68rp3FFkoU8zj3ckuTKzXqf0=.dc02a0d4-2671-4c70-a470-a64f28e38f2d@github.com> Message-ID: <2t3peiZ70K4xcs0LhocSx5jWPVlRns_dEp52j2uwJWk=.432a5285-8197-44c5-b308-9c9a2b602c79@github.com> On Wed, 8 May 2024 07:10:25 GMT, Christian Hagedorn wrote: >> test/hotspot/jtreg/compiler/splitif/TestLongCountedLoopConvL2I.java line 31: >> >>> 29: * -XX:+StressGCM -XX:StressSeed=92643864 TestLongCountedLoopConvL2I >>> 30: * @run main/othervm -XX:-BackgroundCompilation -XX:-TieredCompilation -XX:-UseOnStackReplacement >>> 31: * -XX:+StressGCM TestLongCountedLoopConvL2I >> >> Would it make sense to have a run that allows OSR? > > You should also add `-XX:+UnlockDiagnosticVMOptions` for the stress flag. Done in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19086#discussion_r1598467494 From roland at openjdk.org Mon May 13 13:27:08 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 May 2024 13:27:08 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop [v2] In-Reply-To: <2lreoMy7UKtgM_m8RCU68rp3FFkoU8zj3ckuTKzXqf0=.dc02a0d4-2671-4c70-a470-a64f28e38f2d@github.com> References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> <2lreoMy7UKtgM_m8RCU68rp3FFkoU8zj3ckuTKzXqf0=.dc02a0d4-2671-4c70-a470-a64f28e38f2d@github.com> Message-ID: On Tue, 7 May 2024 17:05:45 GMT, Emanuel Peter wrote: > I guess the issue is that ConvL2I and ConvI2L are also type nodes, which can restrict their type, just like CastII nodes. And that restricting of the type is only true under a certain if-branch. That's not entirely true here. The `ConvL2I` captures the type of its input so not a narrower type. The problem is that the type is that of a `Phi` for a counted loop and once pushed through phi, the type captured by the `ConvI2L` becomes incorrect. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19086#issuecomment-2107569510 From roland at openjdk.org Mon May 13 13:27:09 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 May 2024 13:27:09 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop [v2] In-Reply-To: References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> Message-ID: On Tue, 7 May 2024 17:25:49 GMT, Christian Hagedorn wrote: > It also seems that it's only a problem with loop iv phis because we improve the iv type in such a way that some of the possible values of the backedge are excluded. So, maybe a first step could be to allow splitting the `Conv*` nodes through non-loop-iv phi nodes. However, there might also be other non-loop-iv phi problems I'm currently not aware of. Nevertheless, it might be worth to investigate further in a separate RFE. I agree that it would be worth investigating further. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19086#discussion_r1598474092 From luhenry at openjdk.org Mon May 13 13:29:03 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 13 May 2024 13:29:03 GMT Subject: RFR: 8332130: RISC-V: remove wrong instructions of Vector Crypto Extension In-Reply-To: References: Message-ID: On Mon, 13 May 2024 08:14:43 GMT, Hamlin Li wrote: > Hi, > Can you help to reivew this simple patch to remove some wrong instrunctions on riscv? > These instrunctions are wrong in that e.g. take `vror.vx` as example, > * by definition of spec, it should be `vror.vx vd, vs2, *rs1*, vm` > * the implementation here, it is indeed `vror_vx(VectorRegister Vd, VectorRegister Vs2, *VectorRegister* Vs1, VectorMask vm = unmasked)` > > Thanks Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19211#pullrequestreview-2052703762 From roland at openjdk.org Mon May 13 13:40:17 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 May 2024 13:40:17 GMT Subject: RFR: 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode [v2] In-Reply-To: <7b3qt72dd5rV6nirPQILkqTMleDRMRYuXlKpqVVVpyo=.c2ed3889-cb43-4576-9d63-de133152b7fb@github.com> References: <_8csQpQVHlNpwenIT4H7OFkMSOaU6Fz-ZmJ0Yi6ArLU=.0b84b78d-4637-49ab-b43f-4c457498b0ce@github.com> <7b3qt72dd5rV6nirPQILkqTMleDRMRYuXlKpqVVVpyo=.c2ed3889-cb43-4576-9d63-de133152b7fb@github.com> Message-ID: On Tue, 7 May 2024 17:29:02 GMT, Christian Hagedorn wrote: > But concepttionally, we want to get these nodes to be removed and the Initialized Assertion Predicates folded once we know that we no longer split loops (i.e. in post loop IGVN). I don't think that's quite correct. Any round of igvn could cause the bounds of a counted loop to change in a way that conflicts with the types captured in the `CastII`/`ConvI2L` nodes. I think that's true even after loop optimizations are over. As a consequence, we want the Assertion Predicates to fold as late as possible. That's poorly tested currently because we emit the predicates in compiled code for debug builds so, in practice, we never really remove them. As part of this change, I wouldn't change that behavior. That seems risky. >> src/hotspot/share/opto/opaquenode.hpp line 138: >> >>> 136: // to true. Therefore, we get rid of them in product builds as they are useless. In debug builds we keep them as >>> 137: // additional verification code (i.e. removing this node and use the BoolNode input instead). >>> 138: class OpaqueInitializedAssertionPredicateNode : public Node { >> >> Shouldn't the new OpaqueInitializedAssertionPredicateNode be a subclass of Opaque4 or shouldn't both be a subclass of a common super type? Don't they share at least some logic or behavior? > > I first thought about reusing this class in some way. But the second input is actually not needed. We could move forward and just remove the second input for `Opaque4` nodes (it's always a true constant). But I still wanted to have an easy way to have a distinguishable node from the other uses of the `Opaque4` nodes in non-null checks. > > Furthermore, I think sub classing the `Opaque4` class can be problematic when doing `is_Opaque4()` since we sometimes expect an `Opaque4` only and sometimes an `OpaqueInitializedAssertionPredicate` only and sometimes both are fine. I think it's cleaner to have two separate classes instead of sub classing each other. > > What do you think? Fair enough. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18951#discussion_r1598493508 PR Review Comment: https://git.openjdk.org/jdk/pull/18951#discussion_r1598494163 From kxu at openjdk.org Mon May 13 13:46:36 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 13 May 2024 13:46:36 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v4] In-Reply-To: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: > Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. > > Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 22 additional commits since the last revision: - add more expressive comments and test cases - Merge branch 'master' into long-typed-parallel-iv - update comments to clarify on type casting - add pseudocode for subgraphs before/after the transformation - remove WIP support for long counted loops - Merge branch 'master' into long-typed-parallel-iv - update tests - update tests - update tests - clean up code for pr - ... and 12 more: https://git.openjdk.org/jdk/compare/1ecc282b...85820dee ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18489/files - new: https://git.openjdk.org/jdk/pull/18489/files/dcd55681..85820dee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=02-03 Stats: 122774 lines in 3145 files changed: 56800 ins; 49870 del; 16104 mod Patch: https://git.openjdk.org/jdk/pull/18489.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18489/head:pull/18489 PR: https://git.openjdk.org/jdk/pull/18489 From kxu at openjdk.org Mon May 13 13:46:37 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 13 May 2024 13:46:37 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v3] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: On Thu, 18 Apr 2024 09:11:13 GMT, Emanuel Peter wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> update comments to clarify on type casting > > test/hotspot/jtreg/compiler/c2/irTests/TestCountedLoopIV.java line 24: > >> 22: */ >> 23: >> 24: package compiler.c2.irTests; > > Putting IR tests into the `irTests` directory is what we did at the beginning, when we assumed IR tests would not be widely adopted. But now it makes more sense to put this test where it belongs "thematically". I suggest you put it under `compiler/loopopts`, or even in a new subdirectory: `compiler/loopopts/parallel_iv`. > > Also the name of this test could be more expressive: `TestLongParallelIvInIntCountedLoop.java` Renamed to `compiler.loopopts.parallel_iv.TestParallelIvInIntCountedLoop` Notice I chose *Test~Long~ParallelIvInIntCountedLoop* since it also tests int IVs. > test/hotspot/jtreg/compiler/c2/irTests/TestCountedLoopIV.java line 63: > >> 61: int a = 0; >> 62: for (int i = 0; i < stop; i++) { >> 63: a += 0; // we unfortunately have to repeat ourselves because the operand has to be a constant > > I don't understand your comment. Why is this test interesting? The IR framework can only test against static code, and the transformation relies on strides being constants to perform constant propagation. Therefore, we have no choice but repeating the same test case multiple times with different numbers. I added comments to clarify this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1598496880 PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1598500281 From kxu at openjdk.org Mon May 13 13:46:37 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 13 May 2024 13:46:37 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v3] In-Reply-To: <0YMwCJtOCiJU6gDibC6awo-iowi3wFuOKPM32sHkGRA=.34e4fec1-ffb9-4ac8-ac2e-35a1c9494020@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> <8zWNeJWcumovt4jcMMCbbhfQJVKDypVM2nR6xRUGx3U=.760cf413-7503-4ab2-a2c2-955f430ee4b4@github.com> <0YMwCJtOCiJU6gDibC6awo-iowi3wFuOKPM32sHkGRA=.34e4fec1-ffb9-4ac8-ac2e-35a1c9494020@github.com> Message-ID: <9SZeJL0GoHL2XzCiyK_zNPTUT9az48DwBou9s5kFI2k=.e8d2d629-67db-4459-bf7e-d12e9435f043@github.com> On Thu, 18 Apr 2024 09:38:20 GMT, Emanuel Peter wrote: >> And why no IR rules for these? > > You definately need more tests with IR rules. Those functions were only called in `testCorrectness()` and excluded from IR verifications. They are not included. >> Generally, it would be nice if you had more cases where we are checking overflows. > > And some with negative strides would be great too. More tests added. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1598500232 PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1598500700 From roland at openjdk.org Mon May 13 13:48:25 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 May 2024 13:48:25 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v16] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 05:04:38 GMT, Galder Zamarre?o wrote: >> Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. >> >> The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: >> >> >> $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op >> ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op >> ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op >> ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op >> ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op >> ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op >> ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op >> ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op >> ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op >> ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op >> ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op >> ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op >> >> >> It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. >> >> I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. >> >> >> $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> ... >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 >> >> >> One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? >> >>... > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/c1/c1_GraphBuilder.cpp > > Co-authored-by: Dean Long <17332032+dean-long at users.noreply.github.com> Otherwise, looks good to me. src/hotspot/share/c1/c1_GraphBuilder.cpp line 2031: > 2029: ciType* type = receiver->exact_type(); > 2030: if (type != nullptr && type->is_loaded()) { > 2031: assert(!type->is_instance_klass() || !type->as_instance_klass()->is_interface(), ""); Please add a message to the assert. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17667#pullrequestreview-2052754875 PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1598505645 From kxu at openjdk.org Mon May 13 13:54:05 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 13 May 2024 13:54:05 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v3] In-Reply-To: <0YMwCJtOCiJU6gDibC6awo-iowi3wFuOKPM32sHkGRA=.34e4fec1-ffb9-4ac8-ac2e-35a1c9494020@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> <8zWNeJWcumovt4jcMMCbbhfQJVKDypVM2nR6xRUGx3U=.760cf413-7503-4ab2-a2c2-955f430ee4b4@github.com> <0YMwCJtOCiJU6gDibC6awo-iowi3wFuOKPM32sHkGRA=.34e4fec1-ffb9-4ac8-ac2e-35a1c9494020@github.com> Message-ID: On Thu, 18 Apr 2024 09:32:19 GMT, Emanuel Peter wrote: >> Can you also be consistent with the names all the way through your comments? I suggest you just only use `stride_con`, and not `stride`. You can use `i` and `a`, if you want. But then it would be helpful if you had two lines with identical expressions, but where you make the transition from `i` to `phi`. > > Ah. It seems that we require `stride2 / stride` to be a lossless division in the code. A comment about that limitation would be helpful. And I think you should also check if there are tests that cover cases where the division would be lossy. > ...be consistent with the names... I had this concern, especially `i` vs `phi`. I didn't think it was reasonable to call the iterator `phi` only because the optimization code calls such a value so by extracting from the phi node. I agree to keep things consistent. The example to trivial to be understood anyway. I updated the naming. > It seems that we require stride2 / stride to be a lossless division in the code. Not only lossless division (i.e., rounding-towards-zero) is used, the optimization requires this division to be exact with no remainders. Checks are in place to make sure optimization only happens if this condition is met: > `if ((ratio_con * stride_con) == stride_con2) { // Check for exact` I updated the comments to be more expressive regarding this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1598512900 From kxu at openjdk.org Mon May 13 13:54:07 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 13 May 2024 13:54:07 GMT Subject: RFR: 8327381: Refactor type-improving transformations in BoolNode::Ideal to BoolNode::Value [v8] In-Reply-To: <3zuVDnNd_9nUXHjG1TCWQjVVWuLcyCLAOEgJKeGnDL0=.996e0ab4-58d0-47c9-875b-26bcaae19887@github.com> References: <3zuVDnNd_9nUXHjG1TCWQjVVWuLcyCLAOEgJKeGnDL0=.996e0ab4-58d0-47c9-875b-26bcaae19887@github.com> Message-ID: On Fri, 5 Apr 2024 10:01:43 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/subnode.cpp line 1816: >> >>> 1814: // Change ((x & m) u<= m) or ((m & x) u<= m) to always true >>> 1815: // Same with ((x & m) u< m+1) and ((m & x) u< m+1) >>> 1816: if (cop == Op_CmpU && cmp1->Opcode() == Op_AndI) { >> >> You made this a bit more complicated than the original. Or was there a specific reason for the `is_Sub`? I'd do this: >> Suggestion: >> >> // Change ((x & m) u<= m) or ((m & x) u<= m) to always true >> // Same with ((x & m) u< m+1) and ((m & x) u< m+1) >> Node* cmp = in(1); >> if (cmp != nullptr && cmp->Opcode() == Op_CmpU) { >> Node* cmp1 = cmp->in(1); >> Node* cmp2 = cmp->in(2); >> if (cmp1->Opcode() == Op_AndI) { > > You could also move the whole code to its own method, and name it something like `BoolNode::Value_cmpu_and_mask`. Maybe you find an even more descriptive name. Cleaned up and moved to `::Value_cmpu_and_mask` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18198#discussion_r1598516418 From dchuyko at openjdk.org Mon May 13 13:55:10 2024 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Mon, 13 May 2024 13:55:10 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 13:03:26 GMT, Evgeny Astigeevich wrote: > Backout of [JDK-8309271](https://bugs.openjdk.org/browse/JDK-8309271) which has known bugs, possible bugs and performance issues. REDO work is tracked by [JDK-8331749](https://bugs.openjdk.org/browse/JDK-8331749). > > Found bugs: > - When refreshing `CompilerDirectivesAddDCmd::execute` will call `DirectivesStack::hasMatchingDirectives(mh, true)` which only considers the compiler directive which is on the top of the directives stack. As more than one directive can be added, `CompilerDirectivesAddDCmd::execute` will not behave as expected. > - A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. > > There are other concerns: bugs and performance issues. > > Possible bugs: > - `has_matching_directives` might not be cleared. A nmethod might get into the unloading state before `CodeCache::recompile_marked_directives_matches`. If the nmethod has been used to mark a Java method and it is the only nmethod, there will be no nmethod in CodeCache to reach the Java method to clear the mark. > - A Java method might have been compiled with new directives before `CodeCache::recompile_marked_directives_matches`. `CodeCache::recompile_marked_directives_matches` will recompile it again. > - JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. > > Performance issues: > - Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, `CodeCache::recompile_marked_directives_matches` will be traversing nmethods most of which don't need recompilation. > > The backout is not clean because of removal of `CompiledMethod`. > > Tested with release and fastdebug builds: tier1 and tier2 passed. Are there any high severity problems caused by the original PR? Especially not in the new functionality. Minor issues could be probably addressed without backing out the entire functionality. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2107638223 From eastigeevich at openjdk.org Mon May 13 14:24:18 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 13 May 2024 14:24:18 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 13:52:17 GMT, Dmitry Chuyko wrote: > Are there any high severity problems caused by the original PR? Especially not in the new functionality. Minor issues could be probably addressed without backing out the entire functionality. Yes, there are: > 1. Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, CodeCache::recompile_marked_directives_matches will be traversing nmethods most of which don't need recompilation. > 2. has_matching_directives might not be cleared. > 3. A Java method is not recompiled as requested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2107720199 From dchuyko at openjdk.org Mon May 13 14:37:03 2024 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Mon, 13 May 2024 14:37:03 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: <7kfgb5FXqda4SzqPO2XUXdx6CM_Z-G970nSpqvJVSYw=.b6b01073-66af-4c7d-8d7c-528a4f87707d@github.com> On Mon, 13 May 2024 14:21:35 GMT, Evgeny Astigeevich wrote: > > Are there any high severity problems caused by the original PR? Especially not in the new functionality. Minor issues could be probably addressed without backing out the entire functionality. > > > > Yes, there are: > > > > > 1. Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, CodeCache::recompile_marked_directives_matches will be traversing nmethods most of which don't need recompilation. > > > 2. has_matching_directives might not be cleared. > > > 3. A Java method is not recompiled as requested. > > So there are cases when new functionality doesn't work as expected (I don't see any other users impacted). Why not file bugs for those cases and estimate their impact? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2107777980 From stefank at openjdk.org Mon May 13 14:42:08 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 13 May 2024 14:42:08 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v10] In-Reply-To: References: Message-ID: <-RRlrDdRqiN1sxsQF7RYJIl8W6Z62LcAq8quEalrzjc=.f6ae63e5-92d9-41be-962b-e2741c676b32@github.com> On Fri, 10 May 2024 15:26:29 GMT, Andrew Haley wrote: >> At the present time, `assert_different_registers()` uses an O(N**2) algorithm in assert_different_registers(). We can utilize RegSet to do it in O(N) time. This would be a useful optimization for all builds with assertions enabled. >> >> In addition, it would be useful to be able to static_assert different registers. >> >> Also, I've taken the opportunity to expand the maximum size of a RegSet to 64 on 64-bit platforms. >> >> I also fixed a bug: sometimes `noreg` is passed to `assert_different_registers()`, but it may only be passed once or a spurious assertion is triggered. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Review feedback Approved. I've written some suggestions that I would prefer, but that are not strictly necessary before integration. src/hotspot/share/asm/register.hpp line 101: > 99: > 100: static constexpr int max_size() { > 101: return (int)(sizeof _bitset * CHAR_BIT); This makes me have to think about operator precedence and what CHAR_BIT is (not typically used in HotSpot). I'd prefer to see something like this: Suggestion: return (int)(sizeof(_bitset) * BitsPerByte); src/hotspot/share/asm/register.hpp line 263: > 261: template > 262: inline constexpr bool different_registers(AbstractRegSet allocated_regs, R first_register, Rx... more_registers) { > 263: if (allocated_regs.contains(first_register)) { FWIW, while first reading this I was looking for the base case of the recursion (the previous versions had some extra specializations). To me it looks like the base case is written in both this function and the function above. I would prefer to have the implementation inside one function only and change this function to use: if (!different_registers(allocated_regs, first_register)) { I think this could make it a bit clearer, but if you prefer the current style, I think that's fine as well. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16617#pullrequestreview-2052702736 PR Review Comment: https://git.openjdk.org/jdk/pull/16617#discussion_r1598475883 PR Review Comment: https://git.openjdk.org/jdk/pull/16617#discussion_r1598591701 From eastigeevich at openjdk.org Mon May 13 14:45:02 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 13 May 2024 14:45:02 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: <7kfgb5FXqda4SzqPO2XUXdx6CM_Z-G970nSpqvJVSYw=.b6b01073-66af-4c7d-8d7c-528a4f87707d@github.com> References: <7kfgb5FXqda4SzqPO2XUXdx6CM_Z-G970nSpqvJVSYw=.b6b01073-66af-4c7d-8d7c-528a4f87707d@github.com> Message-ID: On Mon, 13 May 2024 14:34:50 GMT, Dmitry Chuyko wrote: > So there are cases when new functionality doesn't work as expected (I don't see any other users impacted). Why not file bugs for those cases and estimate their impact? Do you know any users using the new functionality? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2107799744 From eastigeevich at openjdk.org Mon May 13 14:45:03 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 13 May 2024 14:45:03 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 13:03:26 GMT, Evgeny Astigeevich wrote: > Backout of [JDK-8309271](https://bugs.openjdk.org/browse/JDK-8309271) which has known bugs, possible bugs and performance issues. REDO work is tracked by [JDK-8331749](https://bugs.openjdk.org/browse/JDK-8331749). > > Found bugs: > - When refreshing `CompilerDirectivesAddDCmd::execute` will call `DirectivesStack::hasMatchingDirectives(mh, true)` which only considers the compiler directive which is on the top of the directives stack. As more than one directive can be added, `CompilerDirectivesAddDCmd::execute` will not behave as expected. > - A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. > > There are other concerns: bugs and performance issues. > > Possible bugs: > - `has_matching_directives` might not be cleared. A nmethod might get into the unloading state before `CodeCache::recompile_marked_directives_matches`. If the nmethod has been used to mark a Java method and it is the only nmethod, there will be no nmethod in CodeCache to reach the Java method to clear the mark. > - A Java method might have been compiled with new directives before `CodeCache::recompile_marked_directives_matches`. `CodeCache::recompile_marked_directives_matches` will recompile it again. > - JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. > > Performance issues: > - Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, `CodeCache::recompile_marked_directives_matches` will be traversing nmethods most of which don't need recompilation. > > The backout is not clean because of removal of `CompiledMethod`. > > Tested with release and fastdebug builds: tier1 and tier2 passed. IMO if nobody uses it and the amount of code is small, it is better to back out it and to reimplement it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2107809381 From dfenacci at openjdk.org Mon May 13 15:50:21 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 13 May 2024 15:50:21 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v10] In-Reply-To: References: Message-ID: > # Issue > When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. > > # Causes > On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. > The same is true for `StoreVector`s. > When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 > > where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor offsets interfere. > Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > but we don?t make sure that there are no masks or offsets. > A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and offsets here too but in this case we can include these cases if the masks and offsets of the vector stores are equivalent. > > # Solution > To avoid folding `Load`- and `StoreVector`s with masks and offsets we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns its own (to avoid ever being the same as a store node). In this way, the checks in `MemNode::can_see_stored_value` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 > > and `StoreNode::Identity` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > will fail if masks or offsets are used. > For 2 stores of the same value we instead check for mask and offset equality. > > Regression tests for all versions of `Load/StoreVectorGather/Masked` have been add... Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: - JDK-8325520: add extra tests - JDK-8325520: more tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18347/files - new: https://git.openjdk.org/jdk/pull/18347/files/9b742109..777bf562 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=08-09 Stats: 484 lines in 1 file changed: 483 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18347.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18347/head:pull/18347 PR: https://git.openjdk.org/jdk/pull/18347 From kvn at openjdk.org Mon May 13 16:32:15 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 13 May 2024 16:32:15 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 14:42:26 GMT, Evgeny Astigeevich wrote: >> Backout of [JDK-8309271](https://bugs.openjdk.org/browse/JDK-8309271) which has known bugs, possible bugs and performance issues. REDO work is tracked by [JDK-8331749](https://bugs.openjdk.org/browse/JDK-8331749). >> >> Found bugs: >> - When refreshing `CompilerDirectivesAddDCmd::execute` will call `DirectivesStack::hasMatchingDirectives(mh, true)` which only considers the compiler directive which is on the top of the directives stack. As more than one directive can be added, `CompilerDirectivesAddDCmd::execute` will not behave as expected. >> - A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. >> >> There are other concerns: bugs and performance issues. >> >> Possible bugs: >> - `has_matching_directives` might not be cleared. A nmethod might get into the unloading state before `CodeCache::recompile_marked_directives_matches`. If the nmethod has been used to mark a Java method and it is the only nmethod, there will be no nmethod in CodeCache to reach the Java method to clear the mark. >> - A Java method might have been compiled with new directives before `CodeCache::recompile_marked_directives_matches`. `CodeCache::recompile_marked_directives_matches` will recompile it again. >> - JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. >> >> Performance issues: >> - Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, `CodeCache::recompile_marked_directives_matches` will be traversing nmethods most of which don't need recompilation. >> >> The backout is not clean because of removal of `CompiledMethod`. >> >> Tested with release and fastdebug builds: tier1 and tier2 passed. > > IMO if nobody uses it and the amount of code is small, it is better to back out it and to reimplement it. @eastig do you have tests which shows issues you listed in description? I don't see any reference to them in this sub-task and in [REDO] bug [JDK-8331749](https://bugs.openjdk.org/browse/JDK-8331749). How you found these issues? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2108154151 From epeter at openjdk.org Mon May 13 17:10:04 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 May 2024 17:10:04 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop [v2] In-Reply-To: References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> <2lreoMy7UKtgM_m8RCU68rp3FFkoU8zj3ckuTKzXqf0=.dc02a0d4-2671-4c70-a470-a64f28e38f2d@github.com> Message-ID: On Mon, 13 May 2024 13:23:07 GMT, Roland Westrelin wrote: > > I guess the issue is that ConvL2I and ConvI2L are also type nodes, which can restrict their type, just like CastII nodes. And that restricting of the type is only true under a certain if-branch. > > That's not entirely true here. The `ConvL2I` captures the type of its input so not a narrower type. The problem is that the type is that of a `Phi` for a counted loop and once pushed through phi, the type captured by the `ConvI2L` becomes incorrect. So what exactly is it that guarantees the correctness of the `phi` range under the counted loop that is not true when you push it back? I mean I would assume the `phi` can only have values that its inputs actually produce, so its inputs cannot have wildly different ranges, right? At some point, this range must be established by some control flow, at which point we can do the "type restriction". I would now have to dive into the code and debug if the "type restriction" for counted loop phi happens purely because of the input values, or because of explicitly restrincting the type of the `ConvI2L`. But I do see that there is some `new ConvI2LNode(input, type)` cases where we do restrict the type of a `ConvI2L`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19086#issuecomment-2108260349 From galder at openjdk.org Mon May 13 17:40:54 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 13 May 2024 17:40:54 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v17] In-Reply-To: References: Message-ID: > Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. > > The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: > > > $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op > ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op > ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op > ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op > ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op > ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op > ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op > ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op > ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op > ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op > ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op > ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op > ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op > ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op > ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op > > > It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. > > I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. > > > $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > ... > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 > > > One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? > > Thanks @rwestrel for his help shaping this up :) Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: Add assert message ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17667/files - new: https://git.openjdk.org/jdk/pull/17667/files/c3b7fa47..09408587 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17667&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17667&range=15-16 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17667.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17667/head:pull/17667 PR: https://git.openjdk.org/jdk/pull/17667 From galder at openjdk.org Mon May 13 17:40:54 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 13 May 2024 17:40:54 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v16] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 13:44:42 GMT, Roland Westrelin wrote: >> Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/c1/c1_GraphBuilder.cpp >> >> Co-authored-by: Dean Long <17332032+dean-long at users.noreply.github.com> > > src/hotspot/share/c1/c1_GraphBuilder.cpp line 2031: > >> 2029: ciType* type = receiver->exact_type(); >> 2030: if (type != nullptr && type->is_loaded()) { >> 2031: assert(!type->is_instance_klass() || !type->as_instance_klass()->is_interface(), ""); > > Please add a message to the assert. Added, is that ok? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1598823687 From dlong at openjdk.org Mon May 13 19:46:03 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 13 May 2024 19:46:03 GMT Subject: RFR: 8331429: [JVMCI] Cleanup JVMCIRuntime allocation routines [v3] In-Reply-To: References: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> Message-ID: On Mon, 13 May 2024 11:34:18 GMT, Yudi Zheng wrote: >> This PR removes allocation routines that may throw exception from JVMCIRuntime. It also exports various symbols related to the hashed secondary supers table. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > remove trailing white space Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19176#pullrequestreview-2053628324 From eastigeevich at openjdk.org Mon May 13 20:37:40 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 13 May 2024 20:37:40 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 16:29:35 GMT, Vladimir Kozlov wrote: > do you have tests which shows issues you listed in description? Here is a jtreg test: - `refresh_control.02.txt` [ { match: "serviceability.dcmd.compiler.DirectivesRefreshTest::callable", c2: { PrintOptoAssembly: true } } ] - `DirectivesRefreshTest02.java` /** * @test DirectivesRefreshTest02 * @summary Test of forced recompile after compiler directives changes by diagnostic command * @requires vm.compiler1.enabled & vm.compiler2.enabled * @library /test/lib / * @modules java.base/jdk.internal.misc * * @build jdk.test.whitebox.WhiteBox * @run driver jdk.test.lib.helpers.ClassFileInstaller jdk.test.whitebox.WhiteBox * * @run main/othervm -Xbootclasspath/a:. -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI * -XX:+BackgroundCompilation -Xlog:codecache=trace -XX:-Inline -XX:+TieredCompilation -XX:CICompilerCount=2 * -XX:+UnlockDiagnosticVMOptions * serviceability.dcmd.compiler.DirectivesRefreshTest02 */ package serviceability.dcmd.compiler; import jdk.test.whitebox.WhiteBox; import jdk.test.lib.process.OutputAnalyzer; import jdk.test.lib.dcmd.CommandExecutor; import jdk.test.lib.dcmd.JMXExecutor; import java.nio.file.Path; import java.nio.file.Paths; import java.lang.reflect.Method; import java.util.Random; import static jdk.test.lib.Asserts.assertEQ; import static compiler.whitebox.CompilerWhiteBoxTest.COMP_LEVEL_NONE; import static compiler.whitebox.CompilerWhiteBoxTest.COMP_LEVEL_SIMPLE; import static compiler.whitebox.CompilerWhiteBoxTest.COMP_LEVEL_FULL_OPTIMIZATION; public class DirectivesRefreshTest02 { static Path cmdPath = Paths.get(System.getProperty("test.src", "."), "refresh_control.02.txt"); static WhiteBox wb = WhiteBox.getWhiteBox(); static Random random = new Random(); static Method method; static CommandExecutor executor; static int callable() { int result = 0; for (int i = 0; i < 100; i++) { result += random.nextInt(100); } return result; } static void setup() throws Exception { method = DirectivesRefreshTest.class.getDeclaredMethod("callable"); executor = new JMXExecutor(); wb.enqueueMethodForCompilation(method, COMP_LEVEL_SIMPLE); while (wb.isMethodQueuedForCompilation(method)) { Thread.onSpinWait(); } wb.lockCompilation(); boolean r = wb.enqueueMethodForCompilation(method, COMP_LEVEL_FULL_OPTIMIZATION); System.out.println("Method enqueued: " + r); } static void testDirectivesAddRefresh() { var output = executor.execute("Compiler.directives_add -r " + cmdPath.toString()); output.stderrShouldBeEmpty().shouldContain("1 compiler directives added"); System.out.println("Method enqueued: " + wb.isMethodQueuedForCompilation(method)); wb.unlockCompilation(); wb.enqueueMethodForCompilation(method, COMP_LEVEL_FULL_OPTIMIZATION); while (wb.isMethodQueuedForCompilation(method)) { Thread.onSpinWait(); } System.out.println("Method compilation level: " + wb.getMethodCompilationLevel(method)); assertEQ(true, false, "Stop here"); } public static void main(String[] args) throws Exception { setup(); testDirectivesAddRefresh(); } } ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2108744800 From dnsimon at openjdk.org Mon May 13 20:37:43 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 13 May 2024 20:37:43 GMT Subject: RFR: 8331429: [JVMCI] Cleanup JVMCIRuntime allocation routines [v3] In-Reply-To: References: <5ar9uYI6ISLKMB7Y4U8LB1nbRjPPbtkW4iyeBsc7o48=.172ab7b7-1c3e-4d82-96e6-02a620f8cf86@github.com> Message-ID: On Mon, 13 May 2024 11:34:18 GMT, Yudi Zheng wrote: >> This PR removes allocation routines that may throw exception from JVMCIRuntime. It also exports various symbols related to the hashed secondary supers table. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > remove trailing white space Marked as reviewed by dnsimon (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19176#pullrequestreview-2053738498 From dfenacci at openjdk.org Mon May 13 20:38:56 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 13 May 2024 20:38:56 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v11] In-Reply-To: References: Message-ID: > # Issue > When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. > > # Causes > On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. > The same is true for `StoreVector`s. > When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 > > where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor offsets interfere. > Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > but we don?t make sure that there are no masks or offsets. > A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and offsets here too but in this case we can include these cases if the masks and offsets of the vector stores are equivalent. > > # Solution > To avoid folding `Load`- and `StoreVector`s with masks and offsets we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns its own (to avoid ever being the same as a store node). In this way, the checks in `MemNode::can_see_stored_value` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 > > and `StoreNode::Identity` > > https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 > > will fail if masks or offsets are used. > For 2 stores of the same value we instead check for mask and offset equality. > > Regression tests for all versions of `Load/StoreVectorGather/Masked` have been add... Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8325520: update match condition ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18347/files - new: https://git.openjdk.org/jdk/pull/18347/files/777bf562..e676bcb1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=09-10 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18347.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18347/head:pull/18347 PR: https://git.openjdk.org/jdk/pull/18347 From eastigeevich at openjdk.org Mon May 13 20:40:49 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 13 May 2024 20:40:49 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 13:03:26 GMT, Evgeny Astigeevich wrote: > Backout of [JDK-8309271](https://bugs.openjdk.org/browse/JDK-8309271) which has known bugs, possible bugs and performance issues. REDO work is tracked by [JDK-8331749](https://bugs.openjdk.org/browse/JDK-8331749). > > Found bugs: > - When refreshing `CompilerDirectivesAddDCmd::execute` will call `DirectivesStack::hasMatchingDirectives(mh, true)` which only considers the compiler directive which is on the top of the directives stack. As more than one directive can be added, `CompilerDirectivesAddDCmd::execute` will not behave as expected. > - A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. > > There are other concerns: bugs and performance issues. > > Possible bugs: > - `has_matching_directives` might not be cleared. A nmethod might get into the unloading state before `CodeCache::recompile_marked_directives_matches`. If the nmethod has been used to mark a Java method and it is the only nmethod, there will be no nmethod in CodeCache to reach the Java method to clear the mark. > - A Java method might have been compiled with new directives before `CodeCache::recompile_marked_directives_matches`. `CodeCache::recompile_marked_directives_matches` will recompile it again. > - JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. > > Performance issues: > - Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, `CodeCache::recompile_marked_directives_matches` will be traversing nmethods most of which don't need recompilation. > > The backout is not clean because of removal of `CompiledMethod`. > > Tested with release and fastdebug builds: tier1 and tier2 passed. There is no `PrintOptoAssembly` in output. I use `lockCompilation()`/`unlockCompilation()` to simulate: > A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. I think using them we can also simulate, though it would not be easy to write a test: > JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2108759073 From eastigeevich at openjdk.org Mon May 13 20:50:01 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 13 May 2024 20:50:01 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: <43tyZlzDKG1-M3YMBjjSKx2R3OosZuyfQySaBuV_KTc=.45597f64-6ff7-4d83-8416-aa29154d92df@github.com> On Mon, 13 May 2024 16:29:35 GMT, Vladimir Kozlov wrote: > How you found these issues? I've been backporting JDK-8309271 to downstream 17 and 21. As compilations happens in background but a test from JDK-8309271 runs with background compilation off, I asked myself what might happen with background compilation. I have a patch fixing the test above. I don't think it is a complete fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2108770472 From dfenacci at openjdk.org Mon May 13 21:03:13 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 13 May 2024 21:03:13 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v6] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 15:29:11 GMT, Damon Fenacci wrote: >> src/hotspot/share/opto/memnode.cpp line 3554: >> >>> 3552: } >>> 3553: } >>> 3554: } >> >> I think the code is now correct. >> But I find the nested if-elseif-elseif-else ... structure a bit hard to read. And there is quite some code duplication (e.g. `result = mem` and all the `eqv_uncast` checks). >> >> You could either do something like this: >> >> if (!is_StoreVector() || >> as_StoreVector()->has_same_vect_type_and_offsets_and_mask(mem->as_StoreVector())) { >> result = mem; >> } >> >> >> Sketch: >> >> has_same_vect_type_and_offsets_and_mask: >> >> different vect_type -> return false >> ... >> >> >> Or maybe it would be better to define virtual functions to get the `mask` and `offsets` from a `StoreVector`? If it has none, just return `nullptr`. Sometimes people worry about virtual methods, but we already use them extensively for the node Value/Ideal anyway. >> >> Then, you can do: >> >> if (!is_StoreVector()) { >> result = mem; >> } else { >> const Node* offsets1 = as_StoreVector()->get_offsets(); >> const Node* offsets2 = mem->as_StoreVector()->get_offsets(); >> const Node* mask1 = as_StoreVector()->get_mask(); >> const Node* mask2 = mem->as_StoreVector()->get_mask(); >> if (offsets1->eqv_uncast(offsets2) && offsets1->eqv_uncast(offsets2)) { >> result = mem; >> } >> } >> >> I think that would be the cleanest and most readable way. >> >> What do you think? > > I agree that it is quite convoluted probably also because I've put `if (!is_StoreVector())` (which is redundant) at the beginning to get the most common case out of the way but still... > At first I thought that multiple inheritance would be a good solution (masks and offsets could be inherited by the corresponding nodes) but the "HotSpot Coding Style" clearly says to avoid it... > So, I think in the end your second suggestion is the cleanest. Changing it... I've updated it. The condition unfortunately doesn't look as clean as the one above as we need to check for `nullptr` (either both or none and `eqv_uncast`). I've tried to make it as concise as possible (we could have made `mask` and `offsets` return a _unique_ node instead, so as to avoid the `nullptr`, but I had the impression it would just make everything less clear). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1599077017 From dfenacci at openjdk.org Mon May 13 21:09:12 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 13 May 2024 21:09:12 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v4] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 12:56:55 GMT, Emanuel Peter wrote: > * No mixed type test for load-store: Use MemorySegment `from/intoMmemorySegment`. Try something like store a int-vector, and load a float-vector. It looks as if load/stores that use `from`/`intoMemorySegment` with different types apparently don?t create `LoadVector` nodes. It seems that `fromMemorySegment` tries to inline the `VectorSupport::load` intrinsic, but fails as the type of the vector and the inferred type of the underlying memory segment differ: https://github.com/openjdk/jdk/blob/9b742109b196d79cbf712ffd3f64edd1d6497114/src/hotspot/share/opto/vectorIntrinsics.cpp#L1055-L1064 > * Mismatched vector length: store a vector of length 4, and load one of length 8. I've added tests tests that store and load with different species (`SPECIES_64`). > * Do some store-store and store-load cases where you the first and second are different loads/stores, i.e. one with and one without mask/offsets. E.g. `StoreVectorMasked` and `StoreVectorScatter` in a store-store test. Doing the total cross-product is probably too much, but a few examples would be a good start. You're right, there were just very few of them. Added many more. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18347#issuecomment-2108799572 From eastigeevich at openjdk.org Mon May 13 21:11:02 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 13 May 2024 21:11:02 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 13:03:26 GMT, Evgeny Astigeevich wrote: > Backout of [JDK-8309271](https://bugs.openjdk.org/browse/JDK-8309271) which has known bugs, possible bugs and performance issues. REDO work is tracked by [JDK-8331749](https://bugs.openjdk.org/browse/JDK-8331749). > > Found bugs: > - When refreshing `CompilerDirectivesAddDCmd::execute` will call `DirectivesStack::hasMatchingDirectives(mh, true)` which only considers the compiler directive which is on the top of the directives stack. As more than one directive can be added, `CompilerDirectivesAddDCmd::execute` will not behave as expected. > - A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. > > There are other concerns: bugs and performance issues. > > Possible bugs: > - `has_matching_directives` might not be cleared. A nmethod might get into the unloading state before `CodeCache::recompile_marked_directives_matches`. If the nmethod has been used to mark a Java method and it is the only nmethod, there will be no nmethod in CodeCache to reach the Java method to clear the mark. > - A Java method might have been compiled with new directives before `CodeCache::recompile_marked_directives_matches`. `CodeCache::recompile_marked_directives_matches` will recompile it again. > - JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. > > Performance issues: > - Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, `CodeCache::recompile_marked_directives_matches` will be traversing nmethods most of which don't need recompilation. > > The backout is not clean because of removal of `CompiledMethod`. > > Tested with release and fastdebug builds: tier1 and tier2 passed. What if instead of backing out we will use an experimental JVM flag: `XX:+CompilerDirectivesRefreshSupport`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2108802569 From cslucas at openjdk.org Mon May 13 22:09:23 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 13 May 2024 22:09:23 GMT Subject: RFR: JDK-8330795 : C2: assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type with -XX:-UseCompressedClassPointers [v3] In-Reply-To: References: Message-ID: <3KQPqbFAyVDkPx28d8DN8Y1_zrJ6LwX6eOEOqxe8mvs=.4ec47e90-e516-4960-96c7-8f0cdbc8b29b@github.com> > The `assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type` failure was caused by the fact that we didn't have a "zero value" for the type T_METADATA. The RAM patch uses that data when it creates a Phi node merging Klass loads and UseCompressedClassPointers is disabled. > > Tested with JTREG tier1-4 on Linux x86_64 & ARM64. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Addressing feedback: more tests. Reverting previous change. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19148/files - new: https://git.openjdk.org/jdk/pull/19148/files/91fc61de..bb632c27 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19148&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19148&range=01-02 Stats: 79 lines in 4 files changed: 54 ins; 3 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/19148.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19148/head:pull/19148 PR: https://git.openjdk.org/jdk/pull/19148 From cslucas at openjdk.org Mon May 13 22:11:05 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 13 May 2024 22:11:05 GMT Subject: RFR: JDK-8330795 : C2: assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type with -XX:-UseCompressedClassPointers In-Reply-To: References: Message-ID: <_-fPO2t3GrRZjX1m0Z8kaH9k6rwSAKm0vZ0tWPFgoVc=.5c2489d5-6d48-4913-b3ac-bd1544dfdf07@github.com> On Thu, 9 May 2024 01:46:45 GMT, Vladimir Kozlov wrote: >> @JohnTortugo, thank you for adding new test. But it would be nice also add additional run with `-XX:+IgnoreUnrecognizedVMOptions -XX:-UseCompressedClassPointers` to failed test `test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java` >> >> Also why you require to run test only with compressed oops on?: >> >> * @requires vm.debug == true & vm.bits == 64 & vm.compiler2.enabled & vm.opt.final.UseCompressedOops & vm.opt.final.EliminateAllocations > >> @JohnTortugo, thank you for adding new test. But it would be nice also add additional run with `-XX:+IgnoreUnrecognizedVMOptions -XX:-UseCompressedClassPointers` to failed test `test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java` > > Actually `-XX:+IgnoreUnrecognizedVMOptions` is not needed because you require `vm.bits == 64` in the test. @vnkozlov - I updated the patch by adding new tests with CompressedOops/CompressedClassPointers enabled and disabled. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19148#issuecomment-2108886624 From kvn at openjdk.org Mon May 13 22:46:02 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 13 May 2024 22:46:02 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: <43tyZlzDKG1-M3YMBjjSKx2R3OosZuyfQySaBuV_KTc=.45597f64-6ff7-4d83-8416-aa29154d92df@github.com> References: <43tyZlzDKG1-M3YMBjjSKx2R3OosZuyfQySaBuV_KTc=.45597f64-6ff7-4d83-8416-aa29154d92df@github.com> Message-ID: On Mon, 13 May 2024 20:46:06 GMT, Evgeny Astigeevich wrote: > There is a race among a thread updating directives, compiler threads and CodeCache cleaning threads. We don't properly lock the directives stack, the compile queue and CodeCache to manage the race. This is indeed concerning. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2108925371 From kvn at openjdk.org Mon May 13 22:46:03 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 13 May 2024 22:46:03 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 21:08:08 GMT, Evgeny Astigeevich wrote: > What if instead of backing out we will use an experimental JVM flag: `XX:+CompilerDirectivesRefreshSupport`? I don't think this is correct way to fix the bug. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2108926307 From kvn at openjdk.org Mon May 13 22:52:05 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 13 May 2024 22:52:05 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 13:03:26 GMT, Evgeny Astigeevich wrote: > Backout of [JDK-8309271](https://bugs.openjdk.org/browse/JDK-8309271) which has known bugs, possible bugs and performance issues. REDO work is tracked by [JDK-8331749](https://bugs.openjdk.org/browse/JDK-8331749). > > Found bugs: > - When refreshing `CompilerDirectivesAddDCmd::execute` will call `DirectivesStack::hasMatchingDirectives(mh, true)` which only considers the compiler directive which is on the top of the directives stack. As more than one directive can be added, `CompilerDirectivesAddDCmd::execute` will not behave as expected. > - A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. > > There are other concerns: bugs and performance issues. > > Possible bugs: > - `has_matching_directives` might not be cleared. A nmethod might get into the unloading state before `CodeCache::recompile_marked_directives_matches`. If the nmethod has been used to mark a Java method and it is the only nmethod, there will be no nmethod in CodeCache to reach the Java method to clear the mark. > - A Java method might have been compiled with new directives before `CodeCache::recompile_marked_directives_matches`. `CodeCache::recompile_marked_directives_matches` will recompile it again. > - JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. > > Performance issues: > - Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, `CodeCache::recompile_marked_directives_matches` will be traversing nmethods most of which don't need recompilation. > > The backout is not clean because of removal of `CompiledMethod`. > > Tested with release and fastdebug builds: tier1 and tier2 passed. I agree with this backout. Thank you @eastig for explaining your point. We have about 3 weeks before RDP1 and it is better we have less issues before that. Let redo implementation in next release taking into account the issues you found and have more time for testing. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19215#pullrequestreview-2053940066 From sviswanathan at openjdk.org Mon May 13 23:15:07 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 13 May 2024 23:15:07 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Rearrange; add lambdas for clarity src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1054: > 1052: } else if (isUL) { > 1053: __ movzbl(rTmp, Address(needle, 2)); > 1054: __ movdl(byte_1, rTmp); Should be: __ movdl(byte_2, rTmp); src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1056: > 1054: __ movdl(byte_1, rTmp); > 1055: // 1st byte of needle in words > 1056: __ vpbroadcastw(byte_1, byte_1, Assembler::AVX_256bit); Should be: __ vpbroadcastw(byte_2, byte_2, Assembler::AVX_256bit); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599194092 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599194375 From kvn at openjdk.org Mon May 13 23:23:01 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 13 May 2024 23:23:01 GMT Subject: RFR: JDK-8330795 : C2: assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type with -XX:-UseCompressedClassPointers In-Reply-To: <_-fPO2t3GrRZjX1m0Z8kaH9k6rwSAKm0vZ0tWPFgoVc=.5c2489d5-6d48-4913-b3ac-bd1544dfdf07@github.com> References: <_-fPO2t3GrRZjX1m0Z8kaH9k6rwSAKm0vZ0tWPFgoVc=.5c2489d5-6d48-4913-b3ac-bd1544dfdf07@github.com> Message-ID: On Mon, 13 May 2024 22:08:44 GMT, Cesar Soares Lucas wrote: >>> @JohnTortugo, thank you for adding new test. But it would be nice also add additional run with `-XX:+IgnoreUnrecognizedVMOptions -XX:-UseCompressedClassPointers` to failed test `test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java` >> >> Actually `-XX:+IgnoreUnrecognizedVMOptions` is not needed because you require `vm.bits == 64` in the test. > > @vnkozlov - I updated the patch by adding new tests with CompressedOops/CompressedClassPointers enabled and disabled. @JohnTortugo This looks reasonable. Can you explain more why having klass field load is bad for your code? Is it because you need klass as constant for deoptimization? Is it possible to handle such case (loading klass) as separate RFE later? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19148#issuecomment-2108978102 From duke at openjdk.org Mon May 13 23:54:09 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Mon, 13 May 2024 23:54:09 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Rearrange; add lambdas for clarity src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4492: > 4490: > 4491: // Compare char[] or byte[] arrays aligned to 4 bytes or substrings. > 4492: void C2_MacroAssembler::arrays_equals(bool is_array_equ, Register ary1, I liked the old style better, fewer longer lines.. same for rest of the changes in this file. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4594: > 4592: #endif //_LP64 > 4593: bind(COMPARE_WIDE_VECTORS); > 4594: vmovdqu(vec1, Address(ary1, limit, create a local scale variable instead of ternary operators. Used several times. src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4250: > 4248: generate_chacha_stubs(); > 4249: > 4250: if ((UseAVX == 2) && EnableX86ECoreOpts && VM_Version::supports_avx2()) { Just `if (EnableX86ECoreOpts)`? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 391: > 389: } > 390: > 391: __ cmpq(needle_len, isU ? 2 : 1); Can we remove this comparison? i.e. - broadcast first and last character unconditionally (same character). Or - move broadcasts 'down' into individual cases.. There is already specialized code to handle needle of size 1.. This adds extra pathlength. (Will we actually call this intrinsic for needle_size==1? Assume length>=2?) src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1365: > 1363: // Compare first byte of needle to haystack > 1364: vpcmpeq(cmp_0, byte_0, Address(haystack, 0), Assembler::AVX_256bit); > 1365: if (size != (isU ? 2 : 1)) { `if (size != scale)` Though in this case, `elem_size` might hold more meaning. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1372: > 1370: > 1371: if (bytesToCompare > 2) { > 1372: if (size > (isU ? 4 : 2)) { `if (size > 2*scale)`? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1373: > 1371: if (bytesToCompare > 2) { > 1372: if (size > (isU ? 4 : 2)) { > 1373: if (doEarlyBailout) { Is there a big perf difference when `doEarlyBailout` is enabled? And/or just for this function? (i.e. removing `doEarlyBailout` in this function will mean less pathlength. Feels like a few extra vpands should be cheap enough.) src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1469: > 1467: > 1468: if (isU && (size & 1)) { > 1469: __ emit_int8(0xcc); This should also be an `assert()` to catch this at compile-time. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1633: > 1631: if (isU) { > 1632: if ((size & 1) != 0) { > 1633: __ emit_int8(0xcc); Compile-time assert to ensure this code is never called instead? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1889: > 1887: // r13 = (needle length - 1) > 1888: // r14 = &needle > 1889: // r15 = unused There is quite a bit of redundancy in register usage. Its not incorrect, but looks odd. Not clear if this duplication can easily be removed (or if/why needed). // rbx = &haystack // rdi = &haystack // rdx = &needle // r14 = &needle // rcx = haystack length // rsi = haystack length // r12 = needle length // r13 = (needle length - 1) // r10 = hs_len - needle len // rbp = -1 // rax = unused // r11 = unused // r8 = unused // r9 = unused // r15 = unused (Could this comment be out-of-sync with the code? Looks like only rbx, r14 and temps out of unused registers are used few lines down) src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1950: > 1948: // r13 = (needle length - 1) > 1949: // r14 = &needle > 1950: // r15 = unused Same as for the small case ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592834449 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592838385 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592831339 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599131482 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599146451 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599144855 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599143784 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599151000 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599204083 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599209564 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599213635 From sviswanathan at openjdk.org Tue May 14 00:51:08 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 14 May 2024 00:51:08 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Rearrange; add lambdas for clarity src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1083: > 1081: // haystack - the address of the first byte of the haystack > 1082: // hsLen - the sizeof the haystack > 1083: // isU - true if argument encoding is either UU or UL We need to list needleLen here as well? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1096: > 1094: MacroAssembler *_masm) { > 1095: > 1096: assert_different_registers(eq_mask, haystack, needleLen, rTmp, hsLen, r10); r10 kind of stands out here. You could say nMinusK in this assert. The assert following to this one is checking for nMinusK==r10 so that should suffice. BTW, didn't see anything in the code below that needs nMinuxK to be r10. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1120: > 1118: #define cmp_0 XMM_TMP3 > 1119: #undef cmp_k > 1120: #define cmp_k XMM_TMP4 XMM_TMP4 is not reused so cmp_k could be declared as const. In general limiting undef/define pair only to reused registers would make the review easier. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1125: > 1123: #undef lastMask > 1124: > 1125: int sizeIncr = isU ? 2 : 1; sizeIncr and scale seems to be same, we could just use one of them in this function. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1178: > 1176: __ andq(eq_mask, lastMask); > 1177: if (needToSaveRCX) { > 1178: __ movdq(rcx, saveRCX); movdq is an expensive instruction (about 3 cycle). If we have another gpr temporary available here for shiftVal, then we dont need to do save/restore rcx. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1183: > 1181: > 1182: if (bytesToCompare > 2) { > 1183: if (size > (isU ? 4 : 2)) { this and other usages could be simplified to: size > 2 * scale ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599201163 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599203881 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599211645 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599202848 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599242323 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599228299 From cslucas at openjdk.org Tue May 14 02:51:01 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 14 May 2024 02:51:01 GMT Subject: RFR: JDK-8330795 : C2: assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type with -XX:-UseCompressedClassPointers In-Reply-To: References: <_-fPO2t3GrRZjX1m0Z8kaH9k6rwSAKm0vZ0tWPFgoVc=.5c2489d5-6d48-4913-b3ac-bd1544dfdf07@github.com> Message-ID: On Mon, 13 May 2024 23:20:12 GMT, Vladimir Kozlov wrote: > Can you explain more why having klass field load is bad for your code? The issue involves LoadNKlass, DecodeNKlass and NULL NKlass. It happens when splitting a LoadNKlass through a nullable Phi. In that process another "nullable" Phi of type TypeNarrowKlass may be created merging the "Klass'es" of the original Phi inputs. A NULL NarrowKlass seems to be something not quite well defined: for instance, there is no definition of "_zero_type" for T_METADATA which is the basic type of TypeNarrowKlass. The first commit in this PR was to add this definition. However, I think a better approach - than the one from first commit - maybe to instead of creating a Phi of type NarrowKlass create a Phi of type TypePtr that merges DecodeNKlass. By doing so I won't need to create a Phi with a NULL **NKlass** so the original patch isn't necessary. However, in my opinion, doing that is better left for a separate RFE + PR. > Is it possible to handle such case (loading klass) as separate RFE later? Yes, I think we can do it as a separate RFE. However, in my experiments klass field loading doesn't appear very often in the benchmarks and therefore may not be much worth the added complication. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19148#issuecomment-2109175690 From kvn at openjdk.org Tue May 14 03:54:01 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 14 May 2024 03:54:01 GMT Subject: RFR: JDK-8330795 : C2: assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type with -XX:-UseCompressedClassPointers In-Reply-To: References: <_-fPO2t3GrRZjX1m0Z8kaH9k6rwSAKm0vZ0tWPFgoVc=.5c2489d5-6d48-4913-b3ac-bd1544dfdf07@github.com> Message-ID: On Tue, 14 May 2024 02:48:44 GMT, Cesar Soares Lucas wrote: > However, in my experiments klass field loading doesn't appear very often in the benchmarks and therefore may not be much worth the added complication. It may be true for code with new allocations but in general case when an object is passed as argument or loaded from field klass loading is common case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19148#issuecomment-2109226957 From kvn at openjdk.org Tue May 14 03:54:02 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 14 May 2024 03:54:02 GMT Subject: RFR: JDK-8330795 : C2: assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type with -XX:-UseCompressedClassPointers [v3] In-Reply-To: <3KQPqbFAyVDkPx28d8DN8Y1_zrJ6LwX6eOEOqxe8mvs=.4ec47e90-e516-4960-96c7-8f0cdbc8b29b@github.com> References: <3KQPqbFAyVDkPx28d8DN8Y1_zrJ6LwX6eOEOqxe8mvs=.4ec47e90-e516-4960-96c7-8f0cdbc8b29b@github.com> Message-ID: On Mon, 13 May 2024 22:09:23 GMT, Cesar Soares Lucas wrote: >> The `assert((uint)type <= T_CONFLICT && _zero_type[type] != nullptr) failed: bad type` failure was caused by the fact that we didn't have a "zero value" for the type T_METADATA. The RAM patch uses that data when it creates a Phi node merging Klass loads and UseCompressedClassPointers is disabled. >> >> Tested with JTREG tier1-4 on Linux x86_64 & ARM64. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Addressing feedback: more tests. Reverting previous change. Thank you for explaining issue you have with klass loading. I will run our testing with you current version. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19148#issuecomment-2109228049 From fyang at openjdk.org Tue May 14 04:31:01 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 14 May 2024 04:31:01 GMT Subject: RFR: 8332130: RISC-V: remove wrong instructions of Vector Crypto Extension In-Reply-To: References: Message-ID: On Mon, 13 May 2024 08:14:43 GMT, Hamlin Li wrote: > Hi, > Can you help to reivew this simple patch to remove some wrong instrunctions on riscv? > These instrunctions are wrong in that e.g. take `vror.vx` as example, > * by definition of spec, it should be `vror.vx vd, vs2, *rs1*, vm` > * the implementation here, it is indeed `vror_vx(VectorRegister Vd, VectorRegister Vs2, *VectorRegister* Vs1, VectorMask vm = unmasked)` > > Thanks I think you mean the `funct3` (`OPIVV` vs `OPIVX`) encoding is wrong? ------------- PR Review: https://git.openjdk.org/jdk/pull/19211#pullrequestreview-2054252934 From mli at openjdk.org Tue May 14 06:15:01 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 14 May 2024 06:15:01 GMT Subject: RFR: 8332130: RISC-V: remove wrong instructions of Vector Crypto Extension In-Reply-To: References: Message-ID: On Tue, 14 May 2024 04:28:40 GMT, Fei Yang wrote: > I think you mean the `funct3` (`OPIVV` vs `OPIVX`) encoding is wrong? Yes ------------- PR Comment: https://git.openjdk.org/jdk/pull/19211#issuecomment-2109364170 From roland at openjdk.org Tue May 14 07:35:01 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 14 May 2024 07:35:01 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop [v2] In-Reply-To: <_kbcMydcMPblcm_FDDuL5vWGT7q6iRoarmYsTlEA0hQ=.290c6744-211d-406d-8ed1-90e510051167@github.com> References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> <_kbcMydcMPblcm_FDDuL5vWGT7q6iRoarmYsTlEA0hQ=.290c6744-211d-406d-8ed1-90e510051167@github.com> Message-ID: On Mon, 13 May 2024 13:23:46 GMT, Roland Westrelin wrote: >> In the test case: >> >> >> long i; >> for (; i > 0; i--) { >> res += 42 / ((int) i); >> >> >> The long counted loop phi has type `[1..100]`. As a consequence, the >> `ConvL2I` also has type `[1..100]`. The `DivI` node that follows can't >> fault: it is not guarded by a zero check and has no control set. >> >> The `ConvL2I` is split through phi and so is the `DiVI` node: >> `PhaseIdealLoop::cannot_split_division()` returns true because the >> value coming from the backedge into the `DivI` (when it is about to be >> split thru phi) is the result of the `ConvL2I` which has type >> `[1..100`] so is not zero as far as the compiler can tell. >> >> On the last iteration of the loop, i is 1. Because the DivI was split >> thru Phi, it computes the value for the following iteration, so for i >> = 0. This causes a crash when the compiled code runs. >> >> The same problem can't happen with an int counted loop because logic >> in `PhaseIdealLoop::split_thru_phi()` prevents a `ConvI2L` from being >> split thru phi. I propose to fix this the same way: in the test case, >> it's not true that once the `ConvL2I` is split thru phi it keeps type >> `[1..100]`. The fix is fairly conservative because it's base on the >> existing logic for `ConvI2L`: we would want to not split a `ConvL2I` >> only a counted loopd but. I suppose the same is true for the `ConvI2L` >> and I thought it would be best to revisit both together. > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - test case tweaks > - fuzzer test Before split if: long i = 100; for (; i > 0;) { // i here is 1..100 int j = (int)i; // ConvL2I type is 1..100, same as loop phi int k = 42 / j; i--; } after split if: long i = 100; int j = 100; int k = 0; for (; i > 0;) { // i here is 1..100 i--; // i here is 0..99 j = (int)i; // ConvL2I type is still 1..100 which is not correct k = 42 / j; } ------------- PR Comment: https://git.openjdk.org/jdk/pull/19086#issuecomment-2109483191 From fyang at openjdk.org Tue May 14 07:41:03 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 14 May 2024 07:41:03 GMT Subject: RFR: 8332130: RISC-V: remove wrong instructions of Vector Crypto Extension In-Reply-To: References: Message-ID: On Tue, 14 May 2024 06:11:57 GMT, Hamlin Li wrote: > > I think you mean the `funct3` (`OPIVV` vs `OPIVX`) encoding is wrong? > > Yes >From the RVV spec [1], the `funct3` encoding for `OPIVX` is 0b100, which is also reflected on the instruction encoding. So why would you think it's wrong? Anything I missed? [1] https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-arithmetic-instruction-formats ------------- PR Comment: https://git.openjdk.org/jdk/pull/19211#issuecomment-2109491672 From epeter at openjdk.org Tue May 14 08:02:02 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 May 2024 08:02:02 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop [v2] In-Reply-To: References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> <_kbcMydcMPblcm_FDDuL5vWGT7q6iRoarmYsTlEA0hQ=.290c6744-211d-406d-8ed1-90e510051167@github.com> Message-ID: On Tue, 14 May 2024 07:32:26 GMT, Roland Westrelin wrote: >> Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: >> >> - test case tweaks >> - fuzzer test > > Before split if: > > long i = 100; > for (; i > 0;) { > // i here is 1..100 > int j = (int)i; // ConvL2I type is 1..100, same as loop phi > int k = 42 / j; > i--; > } > > > after split if: > > > long i = 100; > int j = 100; > int k = 0; > for (; i > 0;) { > // i here is 1..100 > i--; > // i here is 0..99 > j = (int)i; // ConvL2I type is still 1..100 which is not correct > k = 42 / j; > } @rwestrel which "split_if" optimization was applied in your example? Split the ConvI2L through the phi? If so, the problem seems to be that the ConvI2L floats by the exit-check, right? after split if: long i = 100; int j = 100; int k = 0; for (; i > 0;) { // i here is 1..100 i--; // i here is 0..99 exit check // i here is 1..99 j = (int)i; // ConvL2I type is still 1..100 which is not correct k = 42 / j; } I guess the issue is that the `ConvL2I` was somehow pinned inside the loop, after the `CountedLoop`, by the `phi`. But when the `ConvL2I` is split into the backedge, it does not stay in the backedge but floats further, passes by the exit-check and goes into the last iteration -> BOOM. How exactly did we narrow the type to `1...100`? I guess that that is some smart logic in the trip count `Phi` node, right? If instead we had a `CastLL` for the exit check that narrows the type, then the `CastLL` would remain after the split-if, and the split `ConvL2I` could not float from the backedge into the loop body of the last iteration. So I guess that is really a limitation: a trip count `Phi` specifically does the narrowing, and so you cannot just split past it. The question is if that is really nice, or if we could do it differently, e.g. via a `CastLL/CastII` on the exit-check? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19086#issuecomment-2109526424 From roland at openjdk.org Tue May 14 08:11:02 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 14 May 2024 08:11:02 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop [v2] In-Reply-To: References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> <_kbcMydcMPblcm_FDDuL5vWGT7q6iRoarmYsTlEA0hQ=.290c6744-211d-406d-8ed1-90e510051167@github.com> Message-ID: On Tue, 14 May 2024 07:32:26 GMT, Roland Westrelin wrote: >> Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: >> >> - test case tweaks >> - fuzzer test > > Before split if: > > long i = 100; > for (; i > 0;) { > // i here is 1..100 > int j = (int)i; // ConvL2I type is 1..100, same as loop phi > int k = 42 / j; > i--; > } > > > after split if: > > > long i = 100; > int j = 100; > int k = 0; > for (; i > 0;) { > // i here is 1..100 > i--; > // i here is 0..99 > j = (int)i; // ConvL2I type is still 1..100 which is not correct > k = 42 / j; > } > @rwestrel which "split_if" optimization was applied in your example? Split the ConvI2L through the phi? If so, the problem seems to be that the ConvI2L floats by the exit-check, right? Yes. > So I guess that is really a limitation: a trip count `Phi` specifically does the narrowing, and so you cannot just split past it. The question is if that is really nice, or if we could do it differently, e.g. via a `CastLL/CastII` on the exit-check? The issue involves conv nodes when split thru phi at a counted loop. That's a narrow corner case. I think fixing it by addressing the corner case where it occurs as proposed is simpler than trying a most general fix which can have hard to anticipate consequences. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19086#issuecomment-2109544930 From redestad at openjdk.org Tue May 14 08:26:05 2024 From: redestad at openjdk.org (Claes Redestad) Date: Tue, 14 May 2024 08:26:05 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v8] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 18:24:25 GMT, Adam Sotona wrote: >> Hi, >> During performance optimization work on Class-File API as JDK lambda generator we found some static initialization killers. >> One of them is `java.lang.classfile.Attributes` with tens of static fields initialized with individual attribute mappers, and common set of all mappers, and static map from attribute names to the mappers. >> >> I propose to turn all the static fields into lazy-initialized static methods and remove `PREDEFINED_ATTRIBUTES` and `standardAttribute(Utf8Entry name)` static mapping method from the `Attributes` API class. >> >> Please let me know your comments or objections and please review the [PR](https://github.com/openjdk/jdk/pull/19006) and [CSR](https://bugs.openjdk.org/browse/JDK-8331414), so we can make it into 23. >> >> Thank you, >> Adam > > Adam Sotona has updated the pull request incrementally with one additional commit since the last revision: > > fixed tests Thank you for this! ------------- Marked as reviewed by redestad (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19006#pullrequestreview-2054638558 From epeter at openjdk.org Tue May 14 08:26:05 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 May 2024 08:26:05 GMT Subject: RFR: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop [v2] In-Reply-To: References: <5LtwU12mrfWgFOMECoULwPzfuVKdjoY9u1sUOU7fo8g=.72eb9076-f8cf-4b76-abd2-828093a55ab5@github.com> <_kbcMydcMPblcm_FDDuL5vWGT7q6iRoarmYsTlEA0hQ=.290c6744-211d-406d-8ed1-90e510051167@github.com> Message-ID: On Tue, 14 May 2024 08:08:08 GMT, Roland Westrelin wrote: >> Before split if: >> >> long i = 100; >> for (; i > 0;) { >> // i here is 1..100 >> int j = (int)i; // ConvL2I type is 1..100, same as loop phi >> int k = 42 / j; >> i--; >> } >> >> >> after split if: >> >> >> long i = 100; >> int j = 100; >> int k = 0; >> for (; i > 0;) { >> // i here is 1..100 >> i--; >> // i here is 0..99 >> j = (int)i; // ConvL2I type is still 1..100 which is not correct >> k = 42 / j; >> } > >> @rwestrel which "split_if" optimization was applied in your example? Split the ConvI2L through the phi? If so, the problem seems to be that the ConvI2L floats by the exit-check, right? > > Yes. > >> So I guess that is really a limitation: a trip count `Phi` specifically does the narrowing, and so you cannot just split past it. The question is if that is really nice, or if we could do it differently, e.g. via a `CastLL/CastII` on the exit-check? > > The issue involves conv nodes when split thru phi at a counted loop. That's a narrow corner case. I think fixing it by addressing the corner case where it occurs as proposed is simpler than trying a most general fix which can have hard to anticipate consequences. @rwestrel Yes, I'm totally fine with the fix. It simply applies the `int` case to `long`. In a future RFE, we could at least restrict the "bailout" to trip-count Phi's, and not all Phi's. In even further RFE's, we could consider doing the type narrowing not in the trip-count phi, but via casts at the checks. That would be a more unified solution. Generally, I feel like we are struggling way too much with all the different ways one can pin and narrow types: it is all mixed into trip-count phi's, Cast's, Conv's etc. Who really can understand all the complicated interactions? It seem we keep piling on special-case logic, but it is a endless whack-a-mole game. Every fix is "simple" but the sum of all those fixes is far from "simple" ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19086#issuecomment-2109575708 From amitkumar at openjdk.org Tue May 14 08:31:13 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 14 May 2024 08:31:13 GMT Subject: RFR: 8331934: [s390x] Add support for primitive array C1 clone intrinsic Message-ID: Adds JDK-8302850 Port for s390x. Testing: make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR jtreg:test/hotspot/jtreg:hotspot_compiler 1166 1166 0 0 ============================== TEST SUCCESS * Tier1 Test with Fast debug build. BenchMarking: Without Patch: Benchmark (size) Mode Cnt Score Error Units ArrayClone.byteArraycopy 0 avgt 15 10.838 ? 0.461 ns/op ArrayClone.byteArraycopy 10 avgt 15 28.919 ? 1.695 ns/op ArrayClone.byteArraycopy 100 avgt 15 48.815 ? 0.901 ns/op ArrayClone.byteArraycopy 1000 avgt 15 256.357 ? 7.901 ns/op ArrayClone.byteClone 0 avgt 15 90.398 ? 3.119 ns/op ArrayClone.byteClone 10 avgt 15 103.774 ? 4.468 ns/op ArrayClone.byteClone 100 avgt 15 126.628 ? 6.952 ns/op ArrayClone.byteClone 1000 avgt 15 326.409 ? 31.635 ns/op ArrayClone.intArraycopy 0 avgt 15 10.450 ? 0.509 ns/op ArrayClone.intArraycopy 10 avgt 15 36.903 ? 0.753 ns/op ArrayClone.intArraycopy 100 avgt 15 85.964 ? 1.806 ns/op ArrayClone.intArraycopy 1000 avgt 15 841.512 ? 40.335 ns/op ArrayClone.intClone 0 avgt 15 89.332 ? 3.695 ns/op ArrayClone.intClone 10 avgt 15 110.639 ? 2.476 ns/op ArrayClone.intClone 100 avgt 15 195.781 ? 8.622 ns/op ArrayClone.intClone 1000 avgt 15 1058.479 ? 92.468 ns/op Finished running test 'micro:java.lang.ArrayClone' with patch: Benchmark (size) Mode Cnt Score Error Units ArrayClone.byteArraycopy 0 avgt 15 10.526 ? 0.289 ns/op ArrayClone.byteArraycopy 10 avgt 15 27.110 ? 0.656 ns/op ArrayClone.byteArraycopy 100 avgt 15 49.872 ? 1.562 ns/op ArrayClone.byteArraycopy 1000 avgt 15 269.518 ? 4.567 ns/op ArrayClone.byteClone 0 avgt 15 10.766 ? 0.899 ns/op ArrayClone.byteClone 10 avgt 15 18.341 ? 0.394 ns/op ArrayClone.byteClone 100 avgt 15 40.986 ? 0.674 ns/op ArrayClone.byteClone 1000 avgt 15 227.512 ? 7.643 ns/op ArrayClone.intArraycopy 0 avgt 15 10.320 ? 0.294 ns/op ArrayClone.intArraycopy 10 avgt 15 36.557 ? 0.860 ns/op ArrayClone.intArraycopy 100 avgt 15 89.837 ? 2.364 ns/op ArrayClone.intArraycopy 1000 avgt 15 836.678 ? 27.920 ns/op ArrayClone.intClone 0 avgt 15 10.043 ? 0.216 ns/op ArrayClone.intClone 10 avgt 15 29.149 ? 0.723 ns/op ArrayClone.intClone 100 avgt 15 88.046 ? 2.211 ns/op ArrayClone.intClone 1000 avgt 15 840.163 ? 58.748 ns/op Finished running test 'micro:java.lang.ArrayClone' ------------- Depends on: https://git.openjdk.org/jdk/pull/17667 Commit messages: - s390x Port Changes: https://git.openjdk.org/jdk/pull/19220/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19220&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331934 Stats: 47 lines in 6 files changed: 23 ins; 2 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/19220.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19220/head:pull/19220 PR: https://git.openjdk.org/jdk/pull/19220 From amitkumar at openjdk.org Tue May 14 08:35:05 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 14 May 2024 08:35:05 GMT Subject: RFR: 8331934: [s390x] Add support for primitive array C1 clone intrinsic In-Reply-To: References: Message-ID: On Mon, 13 May 2024 17:08:03 GMT, Amit Kumar wrote: > Adds JDK-8302850 Port for s390x. > > Testing: > > make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:hotspot_compiler 1166 1166 0 0 > ============================== > TEST SUCCESS > > * Tier1 Test with Fast debug build. > > BenchMarking: > > > Without Patch: > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 10.838 ? 0.461 ns/op > ArrayClone.byteArraycopy 10 avgt 15 28.919 ? 1.695 ns/op > ArrayClone.byteArraycopy 100 avgt 15 48.815 ? 0.901 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 256.357 ? 7.901 ns/op > ArrayClone.byteClone 0 avgt 15 90.398 ? 3.119 ns/op > ArrayClone.byteClone 10 avgt 15 103.774 ? 4.468 ns/op > ArrayClone.byteClone 100 avgt 15 126.628 ? 6.952 ns/op > ArrayClone.byteClone 1000 avgt 15 326.409 ? 31.635 ns/op > ArrayClone.intArraycopy 0 avgt 15 10.450 ? 0.509 ns/op > ArrayClone.intArraycopy 10 avgt 15 36.903 ? 0.753 ns/op > ArrayClone.intArraycopy 100 avgt 15 85.964 ? 1.806 ns/op > ArrayClone.intArraycopy 1000 avgt 15 841.512 ? 40.335 ns/op > ArrayClone.intClone 0 avgt 15 89.332 ? 3.695 ns/op > ArrayClone.intClone 10 avgt 15 110.639 ? 2.476 ns/op > ArrayClone.intClone 100 avgt 15 195.781 ? 8.622 ns/op > ArrayClone.intClone 1000 avgt 15 1058.479 ? 92.468 ns/op > Finished running test 'micro:java.lang.ArrayClone' > > > with patch: > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 10.526 ? 0.289 ns/op > ArrayClone.byteArraycopy 10 avgt 15 27.110 ? 0.656 ns/op > Arra... @RealLucy @TheRealMDoerr Would you please review this one. :-) Testing seems clear on s390x. I have posted Benchmark result as well. Please let me know if any further testing is required. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19220#issuecomment-2109594090 From luhenry at openjdk.org Tue May 14 08:46:04 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 14 May 2024 08:46:04 GMT Subject: RFR: 8332130: RISC-V: remove wrong instructions of Vector Crypto Extension In-Reply-To: References: Message-ID: On Tue, 14 May 2024 07:37:39 GMT, Fei Yang wrote: >>> I think you mean the `funct3` (`OPIVV` vs `OPIVX`) encoding is wrong? >> >> Yes > >> > I think you mean the `funct3` (`OPIVV` vs `OPIVX`) encoding is wrong? >> >> Yes > > From the RVV spec [1], the `funct3` encoding for `OPIVX` is 0b100, which is also reflected on the instruction encoding. > So why would you think it's wrong? Anything I missed? > > [1] https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-arithmetic-instruction-formats @RealFYang the `.vx` variant expect a **scalar** register while our `vandn_vx` takes a **vector** register. If we had a use for `vandn_vx` (or any of the other removed instructions), we would need to add another section with #define INSN(NAME, op, funct3, funct6) \ void NAME(VectorRegister Vd, VectorRegister Vs2, Register Rs1, VectorMask vm = unmasked) { \ patch_VArith(op, Vd, funct3, Rs1->raw_encoding(), Vs2, vm, funct6); \ } But given we have no use for these instructions, I'm ok with removing them. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19211#issuecomment-2109617260 From fyang at openjdk.org Tue May 14 10:04:02 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 14 May 2024 10:04:02 GMT Subject: RFR: 8332130: RISC-V: remove wrong instructions of Vector Crypto Extension In-Reply-To: References: Message-ID: On Mon, 13 May 2024 08:14:43 GMT, Hamlin Li wrote: > Hi, > Can you help to reivew this simple patch to remove some wrong instrunctions on riscv? > These instrunctions are wrong in that e.g. take `vror.vx` as example, > * by definition of spec, it should be `vror.vx vd, vs2, *rs1*, vm` > * the implementation here, it is indeed `vror_vx(VectorRegister Vd, VectorRegister Vs2, *VectorRegister* Vs1, VectorMask vm = unmasked)` > > Thanks Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19211#pullrequestreview-2054887463 From fyang at openjdk.org Tue May 14 10:04:03 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 14 May 2024 10:04:03 GMT Subject: RFR: 8332130: RISC-V: remove wrong instructions of Vector Crypto Extension In-Reply-To: References: Message-ID: On Tue, 14 May 2024 07:37:39 GMT, Fei Yang wrote: >>> I think you mean the `funct3` (`OPIVV` vs `OPIVX`) encoding is wrong? >> >> Yes > >> > I think you mean the `funct3` (`OPIVV` vs `OPIVX`) encoding is wrong? >> >> Yes > > From the RVV spec [1], the `funct3` encoding for `OPIVX` is 0b100, which is also reflected on the instruction encoding. > So why would you think it's wrong? Anything I missed? > > [1] https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-arithmetic-instruction-formats > @RealFYang the `.vx` variant expect a **scalar** register while our `vandn_vx` takes a **vector** register. If we had a use for `vandn_vx` (or any of the other removed instructions), we would need to add another section with > > ``` > #define INSN(NAME, op, funct3, funct6) \ > void NAME(VectorRegister Vd, VectorRegister Vs2, Register Rs1, VectorMask vm = unmasked) { \ > patch_VArith(op, Vd, funct3, Rs1->raw_encoding(), Vs2, vm, funct6); \ > } > ``` > > But given we have no use for these instructions, I'm ok with removing them. Ah, I see. Looks good. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19211#issuecomment-2109789006 From dchuyko at openjdk.org Tue May 14 10:48:04 2024 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Tue, 14 May 2024 10:48:04 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 21:08:08 GMT, Evgeny Astigeevich wrote: >> Backout of [JDK-8309271](https://bugs.openjdk.org/browse/JDK-8309271) which has known bugs, possible bugs and performance issues. REDO work is tracked by [JDK-8331749](https://bugs.openjdk.org/browse/JDK-8331749). >> >> Found bugs: >> - When refreshing `CompilerDirectivesAddDCmd::execute` will call `DirectivesStack::hasMatchingDirectives(mh, true)` which only considers the compiler directive which is on the top of the directives stack. As more than one directive can be added, `CompilerDirectivesAddDCmd::execute` will not behave as expected. >> - A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. >> >> There are other concerns: bugs and performance issues. >> >> Possible bugs: >> - `has_matching_directives` might not be cleared. A nmethod might get into the unloading state before `CodeCache::recompile_marked_directives_matches`. If the nmethod has been used to mark a Java method and it is the only nmethod, there will be no nmethod in CodeCache to reach the Java method to clear the mark. >> - A Java method might have been compiled with new directives before `CodeCache::recompile_marked_directives_matches`. `CodeCache::recompile_marked_directives_matches` will recompile it again. >> - JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. >> >> Performance issues: >> - Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, `CodeCache::recompile_marked_directives_matches` will be traversing nmethods most of which don't need recompilation. >> >> The backout is not clean because of removal of `CompiledMethod`. >> >> Tested with release and fastdebug builds: tier1 and tier2 passed. > > What if instead of backing out we will use an experimental JVM flag: `XX:+CompilerDirectivesRefreshSupport`? > I agree with this backout. Thank you @eastig for explaining your point. We have about 3 weeks before RDP1 and it is better we have less issues before that. Let redo implementation in next release taking into account the issues you found and have more time for testing. OK. I hope it takes less time to get back into the source tree than it did initially. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2109874596 From mli at openjdk.org Tue May 14 11:30:08 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 14 May 2024 11:30:08 GMT Subject: RFR: 8332130: RISC-V: remove wrong instructions of Vector Crypto Extension In-Reply-To: References: Message-ID: On Mon, 13 May 2024 08:14:43 GMT, Hamlin Li wrote: > Hi, > Can you help to reivew this simple patch to remove some wrong instrunctions on riscv? > These instrunctions are wrong in that e.g. take `vror.vx` as example, > * by definition of spec, it should be `vror.vx vd, vs2, *rs1*, vm` > * the implementation here, it is indeed `vror_vx(VectorRegister Vd, VectorRegister Vs2, *VectorRegister* Vs1, VectorMask vm = unmasked)` > > Thanks Sorry for misleading. Thanks @luhenry @RealFYang for your reviewing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19211#issuecomment-2109955129 From mli at openjdk.org Tue May 14 11:30:09 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 14 May 2024 11:30:09 GMT Subject: Integrated: 8332130: RISC-V: remove wrong instructions of Vector Crypto Extension In-Reply-To: References: Message-ID: <4MfiGaorr01EQssf26w0dXY6brY2JZ5RDOAbQ3Kzwds=.67b89bbd-19cb-452b-96ea-138e1a1995ab@github.com> On Mon, 13 May 2024 08:14:43 GMT, Hamlin Li wrote: > Hi, > Can you help to reivew this simple patch to remove some wrong instrunctions on riscv? > These instrunctions are wrong in that e.g. take `vror.vx` as example, > * by definition of spec, it should be `vror.vx vd, vs2, *rs1*, vm` > * the implementation here, it is indeed `vror_vx(VectorRegister Vd, VectorRegister Vs2, *VectorRegister* Vs1, VectorMask vm = unmasked)` > > Thanks This pull request has now been integrated. Changeset: 7ce4a13c Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/7ce4a13c0a891e606480e138f4025ffa328a18b3 Stats: 5 lines in 1 file changed: 0 ins; 5 del; 0 mod 8332130: RISC-V: remove wrong instructions of Vector Crypto Extension Reviewed-by: luhenry, fyang ------------- PR: https://git.openjdk.org/jdk/pull/19211 From amitkumar at openjdk.org Tue May 14 13:02:11 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 14 May 2024 13:02:11 GMT Subject: RFR: 8331311: C2: Big Endian Port of 8318446: optimize stores into primitive arrays by combining values into larger store In-Reply-To: References: Message-ID: On Mon, 13 May 2024 15:58:31 GMT, Richard Reingruber wrote: >> This pr adds a few tweaks to [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446) which allows enabling it also on big endian platforms (e.g. AIX, S390). JDK-8318446 introduced a C2 optimization to replace consecutive stores to a primitive array with just one store. >> >> By example (from `TestMergeStores.java`): >> >> >> static Object[] test2a(byte[] a, int offset, long v) { >> if (IS_BIG_ENDIAN) { >> a[offset + 0] = (byte)(v >> 56); >> a[offset + 1] = (byte)(v >> 48); >> a[offset + 2] = (byte)(v >> 40); >> a[offset + 3] = (byte)(v >> 32); >> a[offset + 4] = (byte)(v >> 24); >> a[offset + 5] = (byte)(v >> 16); >> a[offset + 6] = (byte)(v >> 8); >> a[offset + 7] = (byte)(v >> 0); >> } else { >> a[offset + 0] = (byte)(v >> 0); >> a[offset + 1] = (byte)(v >> 8); >> a[offset + 2] = (byte)(v >> 16); >> a[offset + 3] = (byte)(v >> 24); >> a[offset + 4] = (byte)(v >> 32); >> a[offset + 5] = (byte)(v >> 40); >> a[offset + 6] = (byte)(v >> 48); >> a[offset + 7] = (byte)(v >> 56); >> } >> return new Object[]{ a }; >> } >> >> >> Depending on the endianess 8 bytes are stored into an array. The order of the stores is the same as the order of an 8-byte-store therefore 8 1-byte-stores can be replaced with just one 8-byte-store (if there aren't too many range checks). >> >> Additionally I've fixed a few comments and a test bug. >> >> The optimization seems to be a little bit more effective on big endian platforms. >> >> Again by example: >> >> >> static Object[] test800a(byte[] a, int offset, long v) { >> if (IS_BIG_ENDIAN) { >> a[offset + 0] = (byte)(v >> 40); // Removed from candidate list >> a[offset + 1] = (byte)(v >> 32); // Removed from candidate list >> a[offset + 2] = (byte)(v >> 24); // Merged >> a[offset + 3] = (byte)(v >> 16); // Merged >> a[offset + 4] = (byte)(v >> 8); // Merged >> a[offset + 5] = (byte)(v >> 0); // Merged >> } else { >> a[offset + 0] = (byte)(v >> 0); // Removed from candidate list >> a[offset + 1] = (byte)(v >> 8); // Removed from candidate list >> a[offset + 2] = (byte)(v >> 16); // Not merged >> a[offset + 3] = (byte)(v >> 24); // Not merged >> a[offset + 4] = (byte)(v >> 32); // Not merge... > > @offamitkumar you can put this through your testing if you like. It should solve the issues with test/hotspot/jtreg/compiler/c2/TestMergeStores.java also for s390. @reinrich test is passing on s390x with your change. tier1 test are in progress. Update: tier1 test are also clean on s390x; ------------- PR Comment: https://git.openjdk.org/jdk/pull/19218#issuecomment-2108186692 From rrich at openjdk.org Tue May 14 13:02:11 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 14 May 2024 13:02:11 GMT Subject: RFR: 8331311: C2: Big Endian Port of 8318446: optimize stores into primitive arrays by combining values into larger store Message-ID: This pr adds a few tweaks to [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446) which allows enabling it also on big endian platforms (e.g. AIX, S390). JDK-8318446 introduced a C2 optimization to replace consecutive stores to a primitive array with just one store. By example (from `TestMergeStores.java`): static Object[] test2a(byte[] a, int offset, long v) { if (IS_BIG_ENDIAN) { a[offset + 0] = (byte)(v >> 56); a[offset + 1] = (byte)(v >> 48); a[offset + 2] = (byte)(v >> 40); a[offset + 3] = (byte)(v >> 32); a[offset + 4] = (byte)(v >> 24); a[offset + 5] = (byte)(v >> 16); a[offset + 6] = (byte)(v >> 8); a[offset + 7] = (byte)(v >> 0); } else { a[offset + 0] = (byte)(v >> 0); a[offset + 1] = (byte)(v >> 8); a[offset + 2] = (byte)(v >> 16); a[offset + 3] = (byte)(v >> 24); a[offset + 4] = (byte)(v >> 32); a[offset + 5] = (byte)(v >> 40); a[offset + 6] = (byte)(v >> 48); a[offset + 7] = (byte)(v >> 56); } return new Object[]{ a }; } Depending on the endianess 8 bytes are stored into an array. The order of the stores is the same as the order of an 8-byte-store therefore 8 1-byte-stores can be replaced with just one 8-byte-store (if there aren't too many range checks). Additionally I've fixed a few comments and a test bug. The optimization seems to be a little bit more effective on big endian platforms. Again by example: static Object[] test800a(byte[] a, int offset, long v) { if (IS_BIG_ENDIAN) { a[offset + 0] = (byte)(v >> 40); // Removed from candidate list a[offset + 1] = (byte)(v >> 32); // Removed from candidate list a[offset + 2] = (byte)(v >> 24); // Merged a[offset + 3] = (byte)(v >> 16); // Merged a[offset + 4] = (byte)(v >> 8); // Merged a[offset + 5] = (byte)(v >> 0); // Merged } else { a[offset + 0] = (byte)(v >> 0); // Removed from candidate list a[offset + 1] = (byte)(v >> 8); // Removed from candidate list a[offset + 2] = (byte)(v >> 16); // Not merged a[offset + 3] = (byte)(v >> 24); // Not merged a[offset + 4] = (byte)(v >> 32); // Not merged a[offset + 5] = (byte)(v >> 40); // Not merged } return new Object[]{ a }; } The sequence of candidate stores begins at the lowest store (in Memory def-use order) and is trimmed to a power of 2 removing higher stores if necessary. On little endian platforms this removes the least significant bytes to be stored. Therefore the remaining stores cannot be merged since this would require a right shift. On big endian platforms the stores of the more significant bytes are removed and the remaining stores can be merged. I introduced new platform attributes `little-endian`, `big-endian` to the IR testing framework to be able to adapt IR matching rules to this difference. Testing: `TestMergeStores.java` on AIX and S390. JTReg tests: tier1-4 of hotspot and jdk. All of Langtools and jaxp. JCK, SPECjvm2008, SPECjbb2015, Renaissance Suite, and SAP specific tests. Testing was done with fastdebug builds on the main platforms and also on Linux/PPC64le and AIX. ------------- Commit messages: - Improve comment - Add bug id - Typo - 8331311: C2: Big Endian Port of 8318446: optimize stores into primitive arrays by combining values into larger store Changes: https://git.openjdk.org/jdk/pull/19218/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19218&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331311 Stats: 572 lines in 3 files changed: 378 ins; 3 del; 191 mod Patch: https://git.openjdk.org/jdk/pull/19218.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19218/head:pull/19218 PR: https://git.openjdk.org/jdk/pull/19218 From rrich at openjdk.org Tue May 14 13:02:11 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 14 May 2024 13:02:11 GMT Subject: RFR: 8331311: C2: Big Endian Port of 8318446: optimize stores into primitive arrays by combining values into larger store In-Reply-To: References: Message-ID: On Mon, 13 May 2024 15:53:52 GMT, Richard Reingruber wrote: > This pr adds a few tweaks to [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446) which allows enabling it also on big endian platforms (e.g. AIX, S390). JDK-8318446 introduced a C2 optimization to replace consecutive stores to a primitive array with just one store. > > By example (from `TestMergeStores.java`): > > > static Object[] test2a(byte[] a, int offset, long v) { > if (IS_BIG_ENDIAN) { > a[offset + 0] = (byte)(v >> 56); > a[offset + 1] = (byte)(v >> 48); > a[offset + 2] = (byte)(v >> 40); > a[offset + 3] = (byte)(v >> 32); > a[offset + 4] = (byte)(v >> 24); > a[offset + 5] = (byte)(v >> 16); > a[offset + 6] = (byte)(v >> 8); > a[offset + 7] = (byte)(v >> 0); > } else { > a[offset + 0] = (byte)(v >> 0); > a[offset + 1] = (byte)(v >> 8); > a[offset + 2] = (byte)(v >> 16); > a[offset + 3] = (byte)(v >> 24); > a[offset + 4] = (byte)(v >> 32); > a[offset + 5] = (byte)(v >> 40); > a[offset + 6] = (byte)(v >> 48); > a[offset + 7] = (byte)(v >> 56); > } > return new Object[]{ a }; > } > > > Depending on the endianess 8 bytes are stored into an array. The order of the stores is the same as the order of an 8-byte-store therefore 8 1-byte-stores can be replaced with just one 8-byte-store (if there aren't too many range checks). > > Additionally I've fixed a few comments and a test bug. > > The optimization seems to be a little bit more effective on big endian platforms. > > Again by example: > > > static Object[] test800a(byte[] a, int offset, long v) { > if (IS_BIG_ENDIAN) { > a[offset + 0] = (byte)(v >> 40); // Removed from candidate list > a[offset + 1] = (byte)(v >> 32); // Removed from candidate list > a[offset + 2] = (byte)(v >> 24); // Merged > a[offset + 3] = (byte)(v >> 16); // Merged > a[offset + 4] = (byte)(v >> 8); // Merged > a[offset + 5] = (byte)(v >> 0); // Merged > } else { > a[offset + 0] = (byte)(v >> 0); // Removed from candidate list > a[offset + 1] = (byte)(v >> 8); // Removed from candidate list > a[offset + 2] = (byte)(v >> 16); // Not merged > a[offset + 3] = (byte)(v >> 24); // Not merged > a[offset + 4] = (byte)(v >> 32); // Not merged > a[offset + 5] = (byte)(v >> 40); // Not merged > } > return new Object[]{ a };... @offamitkumar you can put this through your testing if you like. It should solve the issues with test/hotspot/jtreg/compiler/c2/TestMergeStores.java also for s390. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19218#issuecomment-2108093968 From pminborg at openjdk.org Tue May 14 14:14:17 2024 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 14 May 2024 14:14:17 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) Message-ID: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> # Stable Values & Collections (Internal) ## Summary This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. ## Goals * Provide an easy and intuitive API to describe value holders that can change at most once. * Decouple declaration from initialization without significant footprint or performance penalties. * Reduce the amount of static initializer and/or field initialization code. * Uphold integrity and consistency, even in a multi-threaded environment. For more details, see the draft JEP: https://openjdk.org/jeps/8312611 ## Performance Performance compared to instance variables using an `AtomicReference` and one protected by double-checked locking under concurrent access by 8 threads: Benchmark Mode Cnt Score Error Units StableBenchmark.instanceAtomic avgt 10 1.576 ? 0.052 ns/op StableBenchmark.instanceDCL avgt 10 1.608 ? 0.059 ns/op StableBenchmark.instanceStable avgt 10 0.979 ? 0.023 ns/op <- StableValue (~40% faster than DCL) Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (8 threads): Benchmark Mode Cnt Score Error Units StableBenchmark.staticAtomic avgt 10 1.335 ? 0.056 ns/op StableBenchmark.staticCHI avgt 10 0.623 ? 0.086 ns/op StableBenchmark.staticDCL avgt 10 1.418 ? 0.171 ns/op StableBenchmark.staticList avgt 10 0.617 ? 0.024 ns/op StableBenchmark.staticStable avgt 10 0.604 ? 0.022 ns/op <- StableValue ( > 2x faster than `AtomicInteger` and DCL) Performance for stable lists in both instance and static contexts whereby the sum of random contents is calculated for stable lists (which are thread-safe) compared to `ArrayList` instances (which are not thread-safe) (under single thread access): Benchmark Mode Cnt Score Error Units StableListSumBenchmark.instanceArrayList avgt 10 0.356 ? 0.005 ns/op StableListSumBenchmark.instanceList avgt 10 0.373 ? 0.017 ns/op <- Stable list StableListSumBenchmark.staticArrayList avgt 10 0.352 ? 0.002 ns/op StableListSumBenchmark.staticList avgt 10 0.356 ? 0.003 ns/op <- Stable list Performance for stable maps in a static context compared to a `ConcurrentHashMap` (under single thread access): Benchmark Mode Cnt Score Error Units StablePropertiesBenchmark.chmRaw avgt 10 3.416 ? 0.031 ns/op StablePropertiesBenchmark.mapRaw avgt 10 2.105 ? 0.012 ns/op <- Stable map (~40% faster) All figures above are from local tests on a Mac M1 laptop and should only be constructed as indicative figures. ## Implementation details There are some noteworthy implementation details in this PR: * A field is _trusted_ if it is _declared_ as a `final StableValue`. Previously, the determination of trustworthiness was connected to the _class in which it was declared_ (e.g. is it a `record` or a hidden class). In order to grant such trust, there are extra restrictions imposed on reflection and `sun.misc.Unsafe` usage for such declared `StableValue` fields. This is similar to how `record` classes are handled. * In order to allow plain memory semantics for read operations across threads (rather than `volatile` semantics which is slower and which is normally required for double-checked-locking access), we perform a _freeze_ operation before an object becomes visible to other threads. This will prevent store-store reordering and hence, we are able to guarantee complete objects are always seen even under plain memory semantics. * In collections with `StableValue` elements/values, a transient `StableValue` view backed by internal arrays is created upon read operations. This improves initialization time, reduces storage requirements, and improves access performance as these transient objects are eliminated by the C2 compiler. ------------- Commit messages: - Merge branch 'master' into stable-value - Rework the creation of StableEnumMaps - Update sun.misc.Unsafe - Fix error in hash code - Add methods to create generic arrays - Change class types - Add a marker interface TrustedFieldType - Improve array test - Clean up tests - Add tests - ... and 162 more: https://git.openjdk.org/jdk/compare/4ba74475...5d5dcced Changes: https://git.openjdk.org/jdk/pull/18794/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18794&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330465 Stats: 5733 lines in 39 files changed: 5708 ins; 13 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/18794.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18794/head:pull/18794 PR: https://git.openjdk.org/jdk/pull/18794 From liach at openjdk.org Tue May 14 14:14:21 2024 From: liach at openjdk.org (Chen Liang) Date: Tue, 14 May 2024 14:14:21 GMT Subject: RFR: 8330465: Stable Values and Collections (Internal) In-Reply-To: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> References: <-KSimQo5kkmCzzMShqGe5QZ9yCSzpWL98gN13v4wP0k=.11dd8d06-18a6-4577-8342-66632cea0b6e@github.com> Message-ID: On Tue, 16 Apr 2024 11:47:23 GMT, Per Minborg wrote: > # Stable Values & Collections (Internal) > > ## Summary > This PR proposes to introduce an internal _Stable Values & Collections_ API, which provides immutable value holders where elements are initialized _at most once_. Stable Values & Collections offer the performance and safety benefits of final fields while offering greater flexibility as to the timing of initialization. > > ## Goals > * Provide an easy and intuitive API to describe value holders that can change at most once. > * Decouple declaration from initialization without significant footprint or performance penalties. > * Reduce the amount of static initializer and/or field initialization code. > * Uphold integrity and consistency, even in a multi-threaded environment. > > For more details, see the draft JEP: https://openjdk.org/jeps/8312611 > > ## Performance > Performance compared to instance variables using an `AtomicReference` and one protected by double-checked locking under concurrent access by 8 threads: > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.instanceAtomic avgt 10 1.576 ? 0.052 ns/op > StableBenchmark.instanceDCL avgt 10 1.608 ? 0.059 ns/op > StableBenchmark.instanceStable avgt 10 0.979 ? 0.023 ns/op <- StableValue (~40% faster than DCL) > > > Performance compared to static variables protected by `AtomicReference`, class-holder idiom holder, and double-checked locking (8 threads): > > > Benchmark Mode Cnt Score Error Units > StableBenchmark.staticAtomic avgt 10 1.335 ? 0.056 ns/op > StableBenchmark.staticCHI avgt 10 0.623 ? 0.086 ns/op > StableBenchmark.staticDCL avgt 10 1.418 ? 0.171 ns/op > StableBenchmark.staticList avgt 10 0.617 ? 0.024 ns/op > StableBenchmark.staticStable avgt 10 0.604 ? 0.022 ns/op <- StableValue ( > 2x faster than `AtomicInteger` and DCL) > > > Performance for stable lists in both instance and static contexts whereby the sum of random contents is calculated for stable lists (which are thread-safe) compared to `ArrayList` instances (which are not thread-safe) (under single thread access): > > > Benchmark Mode Cnt Score Error Units > StableListSumBenchmark.instanceArrayList avgt 10 0.356 ? 0.005 ns/op > StableListSumBenchmark.instanceList avgt 10 0.373 ? 0.017 ns/op <- Stable list > StableListSumBenchmark.staticArrayList avgt 10 0.352 ? 0.002 ns/op > StableListSumBenchmark.staticList avgt 10 0.356 ? 0.00... Glad to see this! Some API design remarks. Also, I want to mention a few important differences between `@Stable` and Stable Values: Patterns: 1. Benign race (does not exist in StableValue API): multiple threads can create an instance and upload, any non-null instance is functionally equivalent so race is ok (seen in most of JDK) 2. compareAndSet (setIfUnset): multiple threads can create instance, only one will succeed in uploading; usually for when the instance computation is cheap but we want single witness. 3. atomic computation (computeIfUnset): only one thread can create instance which will be witnessed by other threads; this pattern ensures correctness and prevents wasteful computation by other threads at the cost of locking and lambda creation. Allocation in objects: `@Stable` field is local to an object but `StableValue` is another object; thus sharing strategy may differ, as stable fields are copied over but StableValue uses a shared cache (which is even better for avoiding redundant computation) Question: 1. Will we ever try to expose the stable benign race model to users? 2. Will we ever try to inline the stable values in object layout like a stable field? Just curious, can you test other samples, like `StableValue>` where the contained `List` is an immutable list from `List.of` factories? I think that would be a meaningful case too. Also on a side note, I just realized there's no equivalent of `@Stable int[]` etc. stable primitive arrays exposed, yet immutable arrays will be useful. Is the Frozen Arrays JEP still active, or will this Stable Values consider expose stable primitive arrays? src/java.base/share/classes/java/lang/reflect/Field.java line 179: > 177: AccessibleObject.checkPermission(); > 178: // Always check if the field is a final StableValue > 179: if (StableValue.class.isAssignableFrom(type) && Modifier.isFinal(modifiers)) { This doesn't protect the Stable Collections. src/java.base/share/classes/java/util/ImmutableCollections.java line 173: > 171: .map(Objects::requireNonNull) > 172: .toArray(); > 173: return keys instanceof EnumSet We can move this instanceof check before the stream call. src/java.base/share/classes/java/util/ImmutableCollections.java line 1457: > 1455: private final V[] elements; > 1456: @Stable > 1457: private final AuxiliaryArrays aux; Is java.util not trusted package so we need `@Stable`? src/java.base/share/classes/java/util/ImmutableCollections.java line 1519: > 1517: // Internal interface used to indicate the presence of > 1518: // the computeIfUnset method that is unique to StableMap and StableEnumMap > 1519: interface HasComputeIfUnset { Suggestion: interface HasComputeIfUnset extends Map> { So maybe we can use pattern matching like: Map> map = ... if (map instanceof HasComputeIfUnset hciu) { // stuff } src/java.base/share/classes/java/util/ImmutableCollections.java line 1668: > 1666: @Override > 1667: public Set>> entrySet() { > 1668: return new AbstractSet<>() { Maybe we want to do `AbstractImmutableSet` like in #18522. src/java.base/share/classes/java/util/ImmutableCollections.java line 1677: > 1675: static final class StableEnumMap, V> > 1676: extends AbstractImmutableMap> > 1677: implements Map>, HasComputeIfUnset { Note that this might be a navigable map, as enums are comparable. src/java.base/share/classes/java/util/ImmutableCollections.java line 1855: > 1853: @Override > 1854: public boolean equals(Object o) { > 1855: return o == this; These implementations are violations to the Set contracts; Set's hash code should be its elements' sum (thus an entry set's hash code is equivalent to its map's hash) and equals should check if all elements are present. This also makes two entry sets from two `entrySet()` calls not equal (at least before valhalla) src/java.base/share/classes/jdk/internal/lang/StableValue.java line 223: > 221: /** > 222: * {@return an unmodifiable, shallowly immutable, thread-safe, value-s