From fyang at openjdk.org Fri Nov 1 00:15:31 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 1 Nov 2024 00:15:31 GMT Subject: RFR: 8343121: RISC-V: More accurate max size for C2SafepointPollStub and C2EntryBarrierStub In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 08:08:05 GMT, Robbin Ehn wrote: >> Hi, please review this small change. >> >> The current max size these two stubs is a bit overestimated and thus is more than needed. >> Since `la`, `far_call` and `far_jump` assember routines used by these two stubs will always >> emit 2 instructions for address inside the code cache, we can make the max size more accurate. >> >> Testing on linux-riscv64 platform: >> - [x] tier1-tier3 (release) >> - [x] hotspot:tier1 (fastdebug) > > Seems fine, thanks. @robehn @feilongjiang : Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21732#issuecomment-2451051688 From fyang at openjdk.org Fri Nov 1 00:15:32 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 1 Nov 2024 00:15:32 GMT Subject: Integrated: 8343121: RISC-V: More accurate max size for C2SafepointPollStub and C2EntryBarrierStub In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 04:09:57 GMT, Fei Yang wrote: > Hi, please review this small change. > > The current max size these two stubs is a bit overestimated and thus is more than needed. > Since `la`, `far_call` and `far_jump` assember routines used by these two stubs will always > emit 2 instructions for address inside the code cache, we can make the max size more accurate. > > Testing on linux-riscv64 platform: > - [x] tier1-tier3 (release) > - [x] hotspot:tier1 (fastdebug) This pull request has now been integrated. Changeset: 803612ee Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/803612ee9377f7875d1b3ceb6f055048703e148c Stats: 7 lines in 1 file changed: 3 ins; 0 del; 4 mod 8343121: RISC-V: More accurate max size for C2SafepointPollStub and C2EntryBarrierStub Reviewed-by: rehn, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/21732 From mdoerr at openjdk.org Fri Nov 1 00:22:42 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 1 Nov 2024 00:22:42 GMT Subject: RFR: 8343205: CompileBroker::possibly_add_compiler_threads excessively polls available memory [v2] In-Reply-To: References: Message-ID: > This PR adds a quick check + bail out in order to avoid excessive usage of slow checks. Especially, it avoids querying the available memory so often. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Minor improvements (review feedback). ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21812/files - new: https://git.openjdk.org/jdk/pull/21812/files/ea2fa546..c229422b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21812&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21812&range=00-01 Stats: 12 lines in 2 files changed: 2 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/21812.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21812/head:pull/21812 PR: https://git.openjdk.org/jdk/pull/21812 From mdoerr at openjdk.org Fri Nov 1 00:27:31 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 1 Nov 2024 00:27:31 GMT Subject: RFR: 8343205: CompileBroker::possibly_add_compiler_threads excessively polls available memory [v2] In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 17:46:17 GMT, Vladimir Kozlov wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Minor improvements (review feedback). > > src/hotspot/share/compiler/compileBroker.cpp line 1027: > >> 1025: >> 1026: int old_c2_count = 0, new_c2_count = 0, old_c1_count = 0, new_c1_count = 0; >> 1027: const int c2_tasks_per_thread = 2, c1_tasks_per_thread = 4; > > Any reason to have such numbers (2 and 4)? Any experiments were done to select the best numbers? Please note that these constants are not new. I have only given them names. I had done some experiments when implementing [JDK-8198756](https://bugs.openjdk.org/browse/JDK-8198756) for JDK11. C1 is faster than C2. Therefore, we can have more C1 tasks per C1 thread. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21812#discussion_r1825304691 From fyang at openjdk.org Fri Nov 1 00:57:35 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 1 Nov 2024 00:57:35 GMT Subject: RFR: 8343122: RISC-V: C2: Small improvement for real runtime callouts In-Reply-To: <3AbaT2SwHVxQcQRu82L8CWzKBhhAxukYOMT5Bjgjt_c=.197639f3-dc6b-46ec-9ecd-82569e7eb074@github.com> References: <3AbaT2SwHVxQcQRu82L8CWzKBhhAxukYOMT5Bjgjt_c=.197639f3-dc6b-46ec-9ecd-82569e7eb074@github.com> Message-ID: On Mon, 28 Oct 2024 04:39:17 GMT, Fei Yang wrote: > Hi, please review this small improvement. > > Currently, we do 11 instructions for real C2 runtime callouts (See riscv_enc_java_to_runtime). > Seems we can materialize the pointer faster with `movptr2`, which will help reduce 2 instructions. > But we will need to reorder the original calling sequence a bit to make `t0` available for `movptr2`. > > Testing on linux-riscv64 platform: > - [x] tier1-tier3 (release) > - [x] hotspot:tier1 (fastdebug) Thanks all for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21733#issuecomment-2451095523 From fyang at openjdk.org Fri Nov 1 00:57:35 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 1 Nov 2024 00:57:35 GMT Subject: Integrated: 8343122: RISC-V: C2: Small improvement for real runtime callouts In-Reply-To: <3AbaT2SwHVxQcQRu82L8CWzKBhhAxukYOMT5Bjgjt_c=.197639f3-dc6b-46ec-9ecd-82569e7eb074@github.com> References: <3AbaT2SwHVxQcQRu82L8CWzKBhhAxukYOMT5Bjgjt_c=.197639f3-dc6b-46ec-9ecd-82569e7eb074@github.com> Message-ID: On Mon, 28 Oct 2024 04:39:17 GMT, Fei Yang wrote: > Hi, please review this small improvement. > > Currently, we do 11 instructions for real C2 runtime callouts (See riscv_enc_java_to_runtime). > Seems we can materialize the pointer faster with `movptr2`, which will help reduce 2 instructions. > But we will need to reorder the original calling sequence a bit to make `t0` available for `movptr2`. > > Testing on linux-riscv64 platform: > - [x] tier1-tier3 (release) > - [x] hotspot:tier1 (fastdebug) This pull request has now been integrated. Changeset: cbda7580 Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/cbda758010c22b0c1b9aec16004d4bfd24ab5c81 Stats: 11 lines in 1 file changed: 4 ins; 2 del; 5 mod 8343122: RISC-V: C2: Small improvement for real runtime callouts Reviewed-by: rehn, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/21733 From jbhateja at openjdk.org Fri Nov 1 01:42:31 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 1 Nov 2024 01:42:31 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support [v3] In-Reply-To: References: <9lYvrMFQKW4Cq9zKFDjkNg9zKgLvdaCtVQTZgUGgpNs=.78ee059a-5f85-4d6e-828a-a71351d44763@github.com> Message-ID: On Wed, 30 Oct 2024 22:11:31 GMT, Srinivas Vamsi Parasa wrote: >> src/hotspot/cpu/x86/assembler_x86.cpp line 2632: >> >>> 2630: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >>> 2631: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_32bit); >>> 2632: evex_prefix_nf(src, 0, dst->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes, no_flags); >> >> Could you also replace VEX_OPCODE_OF_3C with the standard naming convention of VEX_OPCODE_MAP4? >> I added /*MAP4*/ in the comments after the prefix for the setzuCC instruction, but it's better to make this change consistently in all places. > > Hi Jatin, > > If I understand correctly, are you suggesting that I add a comment in the front like `/* MAP4 */VEX_OPCODE_0F_3C` for all occurrences of VEX_OPCODE_OF_3C in this PR? I would prefer directly using VEX_OPCODE_MAP4 as its a standard naming convention used by [APX specifications](https://cdrdv2.intel.com/v1/dl/getContent/784266) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21770#discussion_r1825338330 From fyang at openjdk.org Fri Nov 1 02:35:02 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 1 Nov 2024 02:35:02 GMT Subject: RFR: 8343415: RISC-V: Increase maximum size of C2EntryBarrierStub by one Message-ID: Hi, please consider this small change. There is one jump to continuation (after nmethod entry barriers) in C2EntryBarrierStub [1]. The current max_size setting assumes the distance is within 1MB, which means a simple `jal` instruction [2]. So I just count one for this jump in [JDK-8343121](https://bugs.openjdk.org/browse/JDK-8343121). This doesn't seem to break for various tests. But I don't think there is a good reason for that assumption to stand. Instead, we should remove this constraint assuming a `auipc+jalr` pair for this jump. [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_CodeStubs_riscv.cpp#L66 [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L965 Testing on linux-riscv64: - [x] tier1 (fastdebug build) ------------- Commit messages: - 8343415: RISC-V: Increased maximum size of C2EntryBarrierStub by one Changes: https://git.openjdk.org/jdk/pull/21818/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21818&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343415 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21818.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21818/head:pull/21818 PR: https://git.openjdk.org/jdk/pull/21818 From fjiang at openjdk.org Fri Nov 1 02:35:02 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 1 Nov 2024 02:35:02 GMT Subject: RFR: 8343415: RISC-V: Increase maximum size of C2EntryBarrierStub by one In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 02:13:16 GMT, Fei Yang wrote: > Hi, please consider this small change. > > There is one jump to continuation (after nmethod entry barriers) in C2EntryBarrierStub [1]. > The current max_size setting assumes the distance is within 1MB, which means a simple `jal` instruction [2]. > So I just count one for this jump in [JDK-8343121](https://bugs.openjdk.org/browse/JDK-8343121). This doesn't seem to break for various tests. But I don't think there is a good reason for that assumption to stand. Instead, we should remove this constraint assuming a `auipc+jalr` pair for this jump. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_CodeStubs_riscv.cpp#L66 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L965 > > Testing on linux-riscv64: > - [x] tier1 (fastdebug build) Looks reasonable, thanks. ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/21818#pullrequestreview-2409366643 From fyang at openjdk.org Fri Nov 1 02:43:06 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 1 Nov 2024 02:43:06 GMT Subject: RFR: 8343415: RISC-V: Increase maximum size of C2EntryBarrierStub by one [v2] In-Reply-To: References: Message-ID: > Hi, please consider this small change. > > There is one jump to continuation (after nmethod entry barriers) in C2EntryBarrierStub [1]. > The current max_size setting assumes the distance is within 1MB, which means a simple `jal` instruction [2]. > So I just count one for this jump in [JDK-8343121](https://bugs.openjdk.org/browse/JDK-8343121). This doesn't seem to break for various tests. But I don't think there is a good reason for that assumption to stand. Instead, we should remove this constraint assuming a `auipc+jalr` pair for this jump. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_CodeStubs_riscv.cpp#L66 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L965 > > Testing on linux-riscv64: > - [x] tier1 (fastdebug build) Fei Yang has updated the pull request incrementally with one additional commit since the last revision: Comment typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21818/files - new: https://git.openjdk.org/jdk/pull/21818/files/b133d081..e07f6e37 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21818&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21818&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21818.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21818/head:pull/21818 PR: https://git.openjdk.org/jdk/pull/21818 From jbhateja at openjdk.org Fri Nov 1 03:50:37 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 1 Nov 2024 03:50:37 GMT Subject: RFR: 8343297: Vector unsigned min/max test are failing with -Xcomp Message-ID: This bugfix patch fixes the incorrect predicated UMinV/UMaxV pattern. All existing VectorAPI jtreg regressions are now passing with -Xcomp. Best Regards, Jatin ------------- Commit messages: - 8343297: Vector unsigned min/max test are failing with -Xcomp Changes: https://git.openjdk.org/jdk/pull/21819/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21819&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343297 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/21819.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21819/head:pull/21819 PR: https://git.openjdk.org/jdk/pull/21819 From swen at openjdk.org Fri Nov 1 04:59:32 2024 From: swen at openjdk.org (Shaojin Wen) Date: Fri, 1 Nov 2024 04:59:32 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v11] In-Reply-To: <6wnkmrDfLXJaccB6xsKXP2sgAG7tXVdEKrqE4vlTsdI=.4af0716c-ad1c-4148-8119-41722c5996d7@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> <6wnkmrDfLXJaccB6xsKXP2sgAG7tXVdEKrqE4vlTsdI=.4af0716c-ad1c-4148-8119-41722c5996d7@github.com> Message-ID: On Tue, 29 Oct 2024 18:29:04 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix distance assert In the toString scenario of Integer/Long and the StringBuilder.appendNull/appendBoolean scenario, we can refactor the code to optimize based on unsafe mergestore. I am waiting for this PR to be merged, and then continue to complete PR #19626 ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2451293546 From thartmann at openjdk.org Fri Nov 1 06:15:27 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 1 Nov 2024 06:15:27 GMT Subject: RFR: 8343297: Vector unsigned min/max test are failing with -Xcomp In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 03:45:27 GMT, Jatin Bhateja wrote: > This bugfix patch fixes the incorrect predicated UMinV/UMaxV pattern. > All existing VectorAPI jtreg regressions are now passing with -Xcomp. > > Best Regards, > Jatin src/hotspot/cpu/x86/x86.ad line 6567: > 6565: %} > 6566: > 6567: instruct vector_uminmax_reg_masked(vec dst, vec src2, kReg mask) %{ Should `src2` be renamed to `src`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21819#discussion_r1825463517 From jbhateja at openjdk.org Fri Nov 1 06:26:27 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 1 Nov 2024 06:26:27 GMT Subject: RFR: 8343297: Vector unsigned min/max test are failing with -Xcomp In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 06:13:02 GMT, Tobias Hartmann wrote: >> This bugfix patch fixes the incorrect predicated UMinV/UMaxV pattern. >> All existing VectorAPI jtreg regressions are now passing with -Xcomp. >> >> Best Regards, >> Jatin > > src/hotspot/cpu/x86/x86.ad line 6567: > >> 6565: %} >> 6566: >> 6567: instruct vector_uminmax_reg_masked(vec dst, vec src2, kReg mask) %{ > > Should `src2` be renamed to `src`? For predicated vector operations, we either populate destination vector lane with the result of the operation if the corresponding mask bit is true or else retain the original contents of lanes. `vec1.lanewise(VectorOperators.UMIN, vec2) ` Here, UMinVNode (vec1, vec2) IR has two source inputs, and two addr matcher pattern alias the first source and destination operand. So src2 looks appropriate and is inline with other predicated operation patterns. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21819#discussion_r1825468346 From thartmann at openjdk.org Fri Nov 1 06:34:00 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 1 Nov 2024 06:34:00 GMT Subject: RFR: 8343206: Final graph reshaping should not compress abstract or interface class pointers [v5] In-Reply-To: References: Message-ID: > @fisk found this problematic optimization in final graph reshaping where we would convert a `CmpP` into a `CmpN` by converting a constant class pointer operand to a narrow class pointer. After [JDK-8338526](https://bugs.openjdk.org/browse/JDK-8338526), this is not valid if the class pointer refers to an interface or abstract class. > > I think it's not an issue in current code though. The only way we can get a dynamic narrow class is when loading from an object at `oopDesc::klass_offset_in_bytes()`: > https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/type.cpp#L3521-L3522 > > Comparisons of such loads with a constant class pointer of interface or abstract class type are always folded during GVN: > https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/subnode.cpp#L1164-L1171 > > And therefore, the code in `Compile::final_graph_reshaping_main_switch` will never trigger. > > I added a corresponding assert and bailout in product to be on the safe side. The assert never triggered in my testing. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Use is_encodable instead ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21784/files - new: https://git.openjdk.org/jdk/pull/21784/files/3da09500..bab7c5df Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21784&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21784&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21784.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21784/head:pull/21784 PR: https://git.openjdk.org/jdk/pull/21784 From thartmann at openjdk.org Fri Nov 1 06:34:00 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 1 Nov 2024 06:34:00 GMT Subject: RFR: 8343206: Final graph reshaping should not compress abstract or interface class pointers [v4] In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 10:10:47 GMT, Tobias Hartmann wrote: >> @fisk found this problematic optimization in final graph reshaping where we would convert a `CmpP` into a `CmpN` by converting a constant class pointer operand to a narrow class pointer. After [JDK-8338526](https://bugs.openjdk.org/browse/JDK-8338526), this is not valid if the class pointer refers to an interface or abstract class. >> >> I think it's not an issue in current code though. The only way we can get a dynamic narrow class is when loading from an object at `oopDesc::klass_offset_in_bytes()`: >> https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/type.cpp#L3521-L3522 >> >> Comparisons of such loads with a constant class pointer of interface or abstract class type are always folded during GVN: >> https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/subnode.cpp#L1164-L1171 >> >> And therefore, the code in `Compile::final_graph_reshaping_main_switch` will never trigger. >> >> I added a corresponding assert and bailout in product to be on the safe side. The assert never triggered in my testing. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Moved assert Thanks for the review, Vladimir and Coleen. I updated the assert according to Coleen's suggestion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21784#issuecomment-2451375945 From thartmann at openjdk.org Fri Nov 1 06:34:00 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 1 Nov 2024 06:34:00 GMT Subject: RFR: 8343206: Final graph reshaping should not compress abstract or interface class pointers [v4] In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 18:33:38 GMT, Coleen Phillimore wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Moved assert > > src/hotspot/share/opto/compile.cpp line 3789: > >> 3787: const TypePtr* tp = n->as_Type()->type()->make_ptr(); >> 3788: ciKlass* klass = tp->is_klassptr()->exact_klass(); >> 3789: assert(!klass->is_interface() && !klass->is_abstract(), "Interface or abstract class pointers should not be compressed"); > > Can you make this assert be instead: > > #include "oops/compressedKlass.hpp" > ... > if debug > Klass* k = klass->metadata(); // get the real klass > assert(CompressedKlassPointers::is_encodable(k), "should be encodable"); > endif // debug Sure, good point. I updated the code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21784#discussion_r1825473202 From thartmann at openjdk.org Fri Nov 1 06:40:03 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 1 Nov 2024 06:40:03 GMT Subject: RFR: 8343206: Final graph reshaping should not compress abstract or interface class pointers [v6] In-Reply-To: References: Message-ID: > @fisk found this problematic optimization in final graph reshaping where we would convert a `CmpP` into a `CmpN` by converting a constant class pointer operand to a narrow class pointer. After [JDK-8338526](https://bugs.openjdk.org/browse/JDK-8338526), this is not valid if the class pointer refers to an interface or abstract class. > > I think it's not an issue in current code though. The only way we can get a dynamic narrow class is when loading from an object at `oopDesc::klass_offset_in_bytes()`: > https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/type.cpp#L3521-L3522 > > Comparisons of such loads with a constant class pointer of interface or abstract class type are always folded during GVN: > https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/subnode.cpp#L1164-L1171 > > And therefore, the code in `Compile::final_graph_reshaping_main_switch` will never trigger. > > I added a corresponding assert and bailout in product to be on the safe side. The assert never triggered in my testing. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Now using the right method .. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21784/files - new: https://git.openjdk.org/jdk/pull/21784/files/bab7c5df..b4f98bde Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21784&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21784&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21784.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21784/head:pull/21784 PR: https://git.openjdk.org/jdk/pull/21784 From thartmann at openjdk.org Fri Nov 1 06:43:28 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 1 Nov 2024 06:43:28 GMT Subject: RFR: 8343297: Vector unsigned min/max test are failing with -Xcomp In-Reply-To: References: Message-ID: <7NebkcjNRckiqhWwo7Mpjucuf4nzi2KYE4bYL_WIMmM=.e3c7877d-1dc4-4434-a6b9-aabb5fddd86f@github.com> On Fri, 1 Nov 2024 03:45:27 GMT, Jatin Bhateja wrote: > This bugfix patch fixes the incorrect predicated UMinV/UMaxV pattern. > All existing VectorAPI jtreg regressions are now passing with -Xcomp. > > Best Regards, > Jatin The fix looks reasonable to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21819#pullrequestreview-2409549191 From thartmann at openjdk.org Fri Nov 1 06:43:29 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 1 Nov 2024 06:43:29 GMT Subject: RFR: 8343297: Vector unsigned min/max test are failing with -Xcomp In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 06:22:30 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/x86.ad line 6567: >> >>> 6565: %} >>> 6566: >>> 6567: instruct vector_uminmax_reg_masked(vec dst, vec src2, kReg mask) %{ >> >> Should `src2` be renamed to `src`? > > For predicated vector operations, we either populate destination vector lane with the result of the operation if the corresponding mask bit is true or else retain the original contents of lanes. > `vec1.lanewise(VectorOperators.UMIN, vec2) > ` > Here, UMinVNode (vec1, vec2) IR has two source inputs, and two addr matcher pattern alias the first source and destination operand. So src2 looks appropriate and is inline with other predicated operation patterns. Thanks for the clarification, makes sense. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21819#discussion_r1825479298 From chagedorn at openjdk.org Fri Nov 1 06:54:34 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 1 Nov 2024 06:54:34 GMT Subject: RFR: 8343380: C2: assert(iff->in(1)->is_OpaqueNotNull()) failed: must be OpaqueNotNull In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 12:27:45 GMT, Christian Hagedorn wrote: > The assert added in [JDK-8342043](https://bugs.openjdk.org/browse/JDK-8342043) turns out to be too strong as shown with the test cases. I was unsure about that in the first place when I added it here: > > https://github.com/openjdk/jdk/pull/21608#discussion_r1808732859 > > The assert was more of a best guess and just an additional guarantee that does not provide any benefit. I've found two cases where we have once an `OuterStripMinedLoopEnd` node and once a `ParsePredicate` in `ConnectionGraph::can_reduce_check_users()` which trigger the assert. How we end up with such a graph is explained in the comments at the test cases. > > I don't think it's worth to tweak the assert as we simply bail out afterwards anyway. I therefore propose to simply get rid of the assert again. > > Thanks, > Christian Thanks Vladimir for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21805#issuecomment-2451395078 From chagedorn at openjdk.org Fri Nov 1 06:54:35 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 1 Nov 2024 06:54:35 GMT Subject: Integrated: 8343380: C2: assert(iff->in(1)->is_OpaqueNotNull()) failed: must be OpaqueNotNull In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 12:27:45 GMT, Christian Hagedorn wrote: > The assert added in [JDK-8342043](https://bugs.openjdk.org/browse/JDK-8342043) turns out to be too strong as shown with the test cases. I was unsure about that in the first place when I added it here: > > https://github.com/openjdk/jdk/pull/21608#discussion_r1808732859 > > The assert was more of a best guess and just an additional guarantee that does not provide any benefit. I've found two cases where we have once an `OuterStripMinedLoopEnd` node and once a `ParsePredicate` in `ConnectionGraph::can_reduce_check_users()` which trigger the assert. How we end up with such a graph is explained in the comments at the test cases. > > I don't think it's worth to tweak the assert as we simply bail out afterwards anyway. I therefore propose to simply get rid of the assert again. > > Thanks, > Christian This pull request has now been integrated. Changeset: 6f6cfe64 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/6f6cfe643b48c21c9b7349b584d31b813c025abd Stats: 111 lines in 2 files changed: 108 ins; 2 del; 1 mod 8343380: C2: assert(iff->in(1)->is_OpaqueNotNull()) failed: must be OpaqueNotNull Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21805 From thartmann at openjdk.org Fri Nov 1 06:58:29 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 1 Nov 2024 06:58:29 GMT Subject: RFR: 8343205: CompileBroker::possibly_add_compiler_threads excessively polls available memory [v2] In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 00:22:42 GMT, Martin Doerr wrote: >> This PR adds a quick check + bail out in order to avoid excessive usage of slow checks. Especially, it avoids querying the available memory so often. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Minor improvements (review feedback). That looks reasonable to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21812#pullrequestreview-2409562451 From epeter at openjdk.org Fri Nov 1 07:13:31 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 1 Nov 2024 07:13:31 GMT Subject: RFR: 8341781: Improve Min/Max node identities [v3] In-Reply-To: <3A-W4pcQj_I0QNWlUU3qibf6SQbNnZyO1JxeH1ym9Lw=.d343a0a6-10f4-4a3a-89fc-06e4cef04d02@github.com> References: <3A-W4pcQj_I0QNWlUU3qibf6SQbNnZyO1JxeH1ym9Lw=.d343a0a6-10f4-4a3a-89fc-06e4cef04d02@github.com> Message-ID: On Thu, 31 Oct 2024 16:53:57 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch implements some missing identities for Min/Max nodes. It adds static type-based operand choosing for MinI/MaxI, such as the ones that MinL/MaxL use. In addition, it adds simplification for patterns such as `Max(A, Max(A, B))` to `Max(A, B)` and `Max(A, Min(A, B))` to `A`. These simplifications stem from the [lattice identity rules](https://en.wikipedia.org/wiki/Lattice_(order)#As_algebraic_structure). The main place I've seen this pattern is with MinL/MaxL nodes created during loop optimizations. Some examples of where this occurs include BigInteger addition/subtraction, and regex code. I've run some of the existing benchmarks and found some nice improvements: >> >> Baseline Patch >> Benchmark Mode Cnt Score Error Units Score Error Units Improvement >> BigIntegers.testAdd avgt 15 25.096 ? 3.936 ns/op 19.214 ? 0.521 ns/op (+ 26.5%) >> PatternBench.charPatternCompile avgt 8 453.727 ? 117.265 ns/op 370.054 ? 26.106 ns/op (+ 20.3%) >> PatternBench.charPatternMatch avgt 8 917.604 ? 121.766 ns/op 810.560 ? 38.437 ns/op (+ 12.3%) >> PatternBench.charPatternMatchWithCompile avgt 8 1477.703 ? 255.783 ns/op 1224.460 ? 28.220 ns/op (+ 18.7%) >> PatternBench.longStringGraphemeMatches avgt 8 860.909 ? 124.661 ns/op 743.729 ? 22.877 ns/op (+ 14.6%) >> PatternBench.splitFlags avgt 8 420.506 ? 76.252 ns/op 321.911 ? 11.661 ns/op (+ 26.6%) >> >> I've added some IR tests, and tier 1 testing passes on my linux machine. Reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Add platform checks to IR > - Merge branch 'master' into minmax_identities > - Suggestions from review > - Min/Max identities The IR rules look ok to me. Nice progress :) test/hotspot/jtreg/compiler/c2/irTests/TestMinMaxIdentities.java line 120: > 118: > 119: @Test > 120: // @IR(applyIfPlatform = { "riscv64", "false" }, phase = { CompilePhase.BEFORE_MACRO_EXPANSION }, counts = { IRNode.MIN_L, "1" }) I would say you should make them negative for now, i.e. make them `failOn`. Otherwise we won't catch these cases when JDK-8307513 gets integrated ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21439#issuecomment-2451413223 PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1825495196 From epeter at openjdk.org Fri Nov 1 07:14:34 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 1 Nov 2024 07:14:34 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v11] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> <6wnkmrDfLXJaccB6xsKXP2sgAG7tXVdEKrqE4vlTsdI=.4af0716c-ad1c-4148-8119-41722c5996d7@github.com> Message-ID: On Fri, 1 Nov 2024 04:56:46 GMT, Shaojin Wen wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix distance assert > > In the toString scenario of Integer/Long and the StringBuilder.appendNull/appendBoolean scenario, we can refactor the code to optimize based on unsafe mergestore. I am waiting for this PR to be merged, and then continue to complete PR #19626 @wenshao Thanks for your patience. @chhagedorn is doing a thorough review right now, so I hope we are only a few days away from integration ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2451414734 From jbhateja at openjdk.org Fri Nov 1 07:37:34 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 1 Nov 2024 07:37:34 GMT Subject: Integrated: 8343297: Vector unsigned min/max test are failing with -Xcomp In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 03:45:27 GMT, Jatin Bhateja wrote: > This bugfix patch fixes the incorrect predicated UMinV/UMaxV pattern. > All existing VectorAPI jtreg regressions are now passing with -Xcomp. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: 8d4d589f Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/8d4d589fc5895f328c7db93bae72048e8711d727 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod 8343297: Vector unsigned min/max test are failing with -Xcomp Reviewed-by: thartmann ------------- PR: https://git.openjdk.org/jdk/pull/21819 From dlong at openjdk.org Fri Nov 1 08:26:27 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 1 Nov 2024 08:26:27 GMT Subject: RFR: 8343206: Final graph reshaping should not compress abstract or interface class pointers [v3] In-Reply-To: References: <4_84pZqk5-pV1iTUdpf5wmVczTdHq-9-Re1qjbGU7Eo=.0fb46e18-883f-45f8-827d-567602373431@github.com> Message-ID: On Thu, 31 Oct 2024 10:01:17 GMT, Tobias Hartmann wrote: >> src/hotspot/share/opto/compile.cpp line 3498: >> >>> 3496: assert(false, "Interface or abstract class pointers should not be compressed"); >>> 3497: } else { >>> 3498: new_in2 = ConNode::make(t->make_narrowklass()); >> >> When I was looking through this code, I was hoping there'd be some sort of assert in the make_narrowklass function so any caller would assert but maybe you don't have that info? > > Right, I was hoping for that too and tried to move the assert into `TypeNarrowKlass::make`. We do have all the information there but we hit false positives in rare cases like this when `MyAbstract` does not have any subtypes at compile time (mostly with `-Xcomp`): > > MyAbstract obj = ...; > obj.getClass(); > > C2 will add a dependency that will invalidate the code once a subclass is loaded and then optimizes the narrow class load from `obj` to be of constant narrow class type `MyAbstract`. The assert will trigger but we will never emit a compressed class pointer because the narrow class load + decode is folded to a non-narrow constant. > > We could move the assert to a later stage though. I'll give that a try. Do we actually generate an nmethod for the above example? It seems like it could never execute the getClass() because the line above setting `obj` would have to throw an exception if there can be no concrete instances. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21784#discussion_r1825573872 From aph at openjdk.org Fri Nov 1 09:07:29 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 1 Nov 2024 09:07:29 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v4] In-Reply-To: <033UQ-960JvpQbpMGHPKyoIv43XWj2bGzdnmCSoZrac=.3ffd7d94-2eb4-44e3-b91b-967d58aa01f7@github.com> References: <033UQ-960JvpQbpMGHPKyoIv43XWj2bGzdnmCSoZrac=.3ffd7d94-2eb4-44e3-b91b-967d58aa01f7@github.com> Message-ID: <8gAEXgKSgry8yzUkhw9c3sNC1FjKkBxUNBUaKe6RgS4=.d622b381-2be4-4944-b4ca-2d860fd93379@github.com> On Tue, 22 Oct 2024 09:28:36 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? Previously it's https://github.com/openjdk/jdk/pull/18605. >> This pr is based on https://github.com/openjdk/jdk/pull/20781. >> >> Thanks! >> >> ## Test >> ### tests: >> * test/jdk/jdk/incubator/vector/ >> * test/hotspot/jtreg/compiler/vectorapi/ >> >> ### options: >> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:+EnableVectorSupport -XX:-UseVectorStubs >> >> ## Performance >> >> ### Tests >> jmh tests are test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ from vectorIntrinsics branch in panama-vector. It's good to have these tests in jdk main stream, I will do it in a separate pr later. (These tests are auto-generated tests from some script&template, it's good to also have those scrip&template in jdk main stream, but those scrip&template generates more other tests than what we need here, so better to add these tests and script&template in another pr). >> >> ### Options >> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' >> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' >> >> ### Performance data >> I have re-tested, there is no much difference from https://github.com/openjdk/jdk/pull/18605, so please check performance data in that pr. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > add comment for tanh Do we really have to wait for JMH tests? PRs should be reasonably reviewable, and this doesn't help. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21502#issuecomment-2451553100 From chagedorn at openjdk.org Fri Nov 1 10:04:35 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 1 Nov 2024 10:04:35 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v11] In-Reply-To: <6wnkmrDfLXJaccB6xsKXP2sgAG7tXVdEKrqE4vlTsdI=.4af0716c-ad1c-4148-8119-41722c5996d7@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> <6wnkmrDfLXJaccB6xsKXP2sgAG7tXVdEKrqE4vlTsdI=.4af0716c-ad1c-4148-8119-41722c5996d7@github.com> Message-ID: On Tue, 29 Oct 2024 18:29:04 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix distance assert src/hotspot/share/opto/mempointer.cpp line 44: > 42: int traversal_count = 0; > 43: while (_worklist.is_nonempty()) { > 44: if (traversal_count++ > 1000) { return MemPointerDecomposedForm(pointer); } Maybe also add a comment as below that we bail out if the graph is too complex. src/hotspot/share/opto/mempointer.cpp line 48: > 46: } > 47: > 48: // Check for constant overflow. To match bail out message below for scale: Suggestion: // Bail out if there is a constant overflow. src/hotspot/share/opto/mempointer.cpp line 52: > 50: > 51: // Sort summands by variable->_idx > 52: _summands.sort(MemPointerSummand::cmp_for_sort); When you name the method something like `cmp_by_variable_idx`, then you could remove the comment. src/hotspot/share/opto/mempointer.cpp line 58: > 56: int pos_get = 0; > 57: while (pos_get < _summands.length()) { > 58: MemPointerSummand summand = _summands.at(pos_get++); Won't this create a new local object? So, if you were to change `summand`, then the `MemPointerSummand` inside `_summand` won't be updated (not the case here, though). Since you only are about to read from the object, I suggest to use a reference instead to avoid creation of a new local object. Suggestion: const MemPointerSummand& summand = _summands.at(pos_get++); src/hotspot/share/opto/mempointer.cpp line 304: > 302: // Pre-Condition: > 303: // We assume that both pointers are in-bounds of their respective memory object. > 304: // Suggestion: // Pre-Condition: // We assume that both pointers are in-bounds of their respective memory object. If this does // not hold, for example, with the use of Unsafe, then we would already have undefined behavior, // and we are allowed to do anything. src/hotspot/share/opto/mempointer.hpp line 39: > 37: // We parse / decompose pointers into a linear form: > 38: // > 39: // pointer = sum_i(scale_i * variable_i) + con Maybe also change this to `SUM()` with a short explanation. Some like that: Suggestion: // We parse / decompose pointers into a linear form: // // pointer = SUM(scale_i * variable_i) + con // // where SUM() adds all "scale_i * variable_i" for each i together. src/hotspot/share/opto/mempointer.hpp line 403: > 401: // > 402: // summand = scale * variable > 403: // For completness: Suggestion: // Summand of a MemPointerDecomposedForm: // // summand = scale * variable // // where variable is a C2 node. src/hotspot/share/opto/mempointer.hpp line 458: > 456: // Decomposed form of the pointer sub-expression of "pointer". > 457: // > 458: // pointer = sum(summands) + con Suggestion: // pointer = SUM(summands) + con ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825550519 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825551652 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825556318 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825570996 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825650181 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1824667743 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825554063 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825543969 From chagedorn at openjdk.org Fri Nov 1 10:04:36 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 1 Nov 2024 10:04:36 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v11] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> <6wnkmrDfLXJaccB6xsKXP2sgAG7tXVdEKrqE4vlTsdI=.4af0716c-ad1c-4148-8119-41722c5996d7@github.com> Message-ID: On Fri, 1 Nov 2024 07:59:32 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix distance assert > > src/hotspot/share/opto/mempointer.cpp line 52: > >> 50: >> 51: // Sort summands by variable->_idx >> 52: _summands.sort(MemPointerSummand::cmp_for_sort); > > When you name the method something like `cmp_by_variable_idx`, then you could remove the comment. Can you also add a comment that sorting it like that enables walking over the summands and combining the scales for the same nodes below? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825556736 From epeter at openjdk.org Fri Nov 1 10:22:50 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 1 Nov 2024 10:22:50 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v12] In-Reply-To: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: > **Background** > I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. > > **Details** > > The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. > > This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. > > More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! > > `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. > > **What this change enables** > > Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). > > Now we can do: > - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. > - Merging `Unsafe` stores to native memory. > - Merging `MemorySegment`: with array, native, ByteBuffer backing types. > - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. > > **Dealing with Overflows** > > We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. > > **Bench... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19970/files - new: https://git.openjdk.org/jdk/pull/19970/files/9f442d27..63496f33 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=10-11 Stats: 10 lines in 2 files changed: 4 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19970/head:pull/19970 PR: https://git.openjdk.org/jdk/pull/19970 From epeter at openjdk.org Fri Nov 1 10:29:10 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 1 Nov 2024 10:29:10 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v13] In-Reply-To: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: > **Background** > I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. > > **Details** > > The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. > > This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. > > More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! > > `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. > > **What this change enables** > > Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). > > Now we can do: > - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. > - Merging `Unsafe` stores to native memory. > - Merging `MemorySegment`: with array, native, ByteBuffer backing types. > - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. > > **Dealing with Overflows** > > We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. > > **Bench... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: apply more suggestions from Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19970/files - new: https://git.openjdk.org/jdk/pull/19970/files/63496f33..3ca647e6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=11-12 Stats: 73 lines in 2 files changed: 24 ins; 9 del; 40 mod Patch: https://git.openjdk.org/jdk/pull/19970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19970/head:pull/19970 PR: https://git.openjdk.org/jdk/pull/19970 From epeter at openjdk.org Fri Nov 1 10:29:10 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 1 Nov 2024 10:29:10 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v11] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> <6wnkmrDfLXJaccB6xsKXP2sgAG7tXVdEKrqE4vlTsdI=.4af0716c-ad1c-4148-8119-41722c5996d7@github.com> Message-ID: On Fri, 1 Nov 2024 08:00:06 GMT, Christian Hagedorn wrote: >> src/hotspot/share/opto/mempointer.cpp line 52: >> >>> 50: >>> 51: // Sort summands by variable->_idx >>> 52: _summands.sort(MemPointerSummand::cmp_for_sort); >> >> When you name the method something like `cmp_by_variable_idx`, then you could remove the comment. > > Can you also add a comment that sorting it like that enables walking over the summands and combining the scales for the same nodes below? good idea! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825674083 From aph at openjdk.org Fri Nov 1 11:04:28 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 1 Nov 2024 11:04:28 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v4] In-Reply-To: <033UQ-960JvpQbpMGHPKyoIv43XWj2bGzdnmCSoZrac=.3ffd7d94-2eb4-44e3-b91b-967d58aa01f7@github.com> References: <033UQ-960JvpQbpMGHPKyoIv43XWj2bGzdnmCSoZrac=.3ffd7d94-2eb4-44e3-b91b-967d58aa01f7@github.com> Message-ID: <1CBxrIcc1nOhl-xlgLDw2qjDt4JFIlOC1kbWXJSTt5w=.cd18419f-40b6-44d6-bce0-5a06e494d9eb@github.com> On Tue, 22 Oct 2024 09:28:36 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? Previously it's https://github.com/openjdk/jdk/pull/18605. >> This pr is based on https://github.com/openjdk/jdk/pull/20781. >> >> Thanks! >> >> ## Test >> ### tests: >> * test/jdk/jdk/incubator/vector/ >> * test/hotspot/jtreg/compiler/vectorapi/ >> >> ### options: >> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:+EnableVectorSupport -XX:-UseVectorStubs >> >> ## Performance >> >> ### Tests >> jmh tests are test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ from vectorIntrinsics branch in panama-vector. It's good to have these tests in jdk main stream, I will do it in a separate pr later. (These tests are auto-generated tests from some script&template, it's good to also have those scrip&template in jdk main stream, but those scrip&template generates more other tests than what we need here, so better to add these tests and script&template in another pr). >> >> ### Options >> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' >> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' >> >> ### Performance data >> I have re-tested, there is no much difference from https://github.com/openjdk/jdk/pull/18605, so please check performance data in that pr. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > add comment for tanh Here are my results, Apple M1. Pretty similar to what we've seen, but no SVE. Looks good. Stubs no Stubs Benchmark (size) Mode Cnt Score (us) relative performance DoubleMaxVector.ACOS 1024 avgt 5 3.962 5.523 1.39 DoubleMaxVector.ASIN 1024 avgt 5 3.236 5.460 1.69 DoubleMaxVector.ATAN 1024 avgt 5 4.856 10.117 2.08 DoubleMaxVector.ATAN2 1024 avgt 5 7.144 18.977 2.66 DoubleMaxVector.CBRT 1024 avgt 5 8.802 9.837 1.12 DoubleMaxVector.COS 1024 avgt 5 6.281 8.789 1.40 DoubleMaxVector.COSH 1024 avgt 5 6.431 8.044 1.25 DoubleMaxVector.EXP 1024 avgt 5 1.939 6.417 3.31 DoubleMaxVector.EXPM1 1024 avgt 5 5.412 9.002 1.66 DoubleMaxVector.HYPOT 1024 avgt 5 4.269 12.323 2.89 DoubleMaxVector.LOG 1024 avgt 5 4.165 8.533 2.05 DoubleMaxVector.LOG10 1024 avgt 5 4.381 11.738 2.68 DoubleMaxVector.LOG1P 1024 avgt 5 4.383 12.135 2.77 DoubleMaxVector.POW 1024 avgt 5 14.060 22.053 1.57 DoubleMaxVector.SIN 1024 avgt 5 5.423 8.652 1.60 DoubleMaxVector.SINH 1024 avgt 5 6.251 8.168 1.31 DoubleMaxVector.TAN 1024 avgt 5 9.271 22.238 2.40 DoubleMaxVector.TANH 1024 avgt 5 4.515 4.499 1.00 Float64Vector.ACOS 1024 avgt 5 3.600 5.472 1.52 Float64Vector.ASIN 1024 avgt 5 2.776 5.547 2.00 Float64Vector.ATAN 1024 avgt 5 3.932 10.129 2.58 Float64Vector.ATAN2 1024 avgt 5 5.913 15.960 2.70 Float64Vector.CBRT 1024 avgt 5 7.464 10.078 1.35 Float64Vector.COS 1024 avgt 5 10.620 9.058 0.85 Float64Vector.COSH 1024 avgt 5 5.899 8.268 1.40 Float64Vector.EXP 1024 avgt 5 1.444 6.642 4.60 Float64Vector.EXPM1 1024 avgt 5 5.467 9.108 1.67 Float64Vector.HYPOT 1024 avgt 5 4.133 9.833 2.38 Float64Vector.LOG 1024 avgt 5 3.172 8.820 2.78 Float64Vector.LOG10 1024 avgt 5 3.346 12.142 3.63 Float64Vector.LOG1P 1024 avgt 5 3.216 12.507 3.89 Float64Vector.POW 1024 avgt 5 13.841 22.105 1.60 Float64Vector.SIN 1024 avgt 5 10.464 8.796 0.84 Float64Vector.SINH 1024 avgt 5 6.680 8.243 1.23 Float64Vector.TAN 1024 avgt 5 10.967 26.275 2.40 Float64Vector.TANH 1024 avgt 5 4.516 4.561 1.01 FloatMaxVector.ACOS 1024 avgt 5 1.819 3.752 2.06 FloatMaxVector.ASIN 1024 avgt 5 1.395 3.682 2.64 FloatMaxVector.ATAN 1024 avgt 5 1.970 7.003 3.55 FloatMaxVector.ATAN2 1024 avgt 5 2.951 12.313 4.17 FloatMaxVector.CBRT 1024 avgt 5 3.733 6.510 1.74 FloatMaxVector.COS 1024 avgt 5 5.405 7.363 1.36 FloatMaxVector.COSH 1024 avgt 5 2.951 5.741 1.95 FloatMaxVector.EXP 1024 avgt 5 0.725 4.745 6.54 FloatMaxVector.EXPM1 1024 avgt 5 2.732 6.490 2.38 FloatMaxVector.HYPOT 1024 avgt 5 2.062 6.328 3.07 FloatMaxVector.LOG 1024 avgt 5 1.587 6.847 4.31 FloatMaxVector.LOG10 1024 avgt 5 1.679 10.035 5.98 FloatMaxVector.LOG1P 1024 avgt 5 1.608 8.616 5.36 FloatMaxVector.POW 1024 avgt 5 6.916 19.432 2.81 FloatMaxVector.SIN 1024 avgt 5 5.239 7.202 1.37 FloatMaxVector.SINH 1024 avgt 5 2.992 5.681 1.90 FloatMaxVector.TAN 1024 avgt 5 5.562 17.419 3.13 FloatMaxVector.TANH 1024 avgt 5 2.788 2.791 1.00 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21502#issuecomment-2451695886 From ihse at openjdk.org Fri Nov 1 11:50:32 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 1 Nov 2024 11:50:32 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v4] In-Reply-To: <033UQ-960JvpQbpMGHPKyoIv43XWj2bGzdnmCSoZrac=.3ffd7d94-2eb4-44e3-b91b-967d58aa01f7@github.com> References: <033UQ-960JvpQbpMGHPKyoIv43XWj2bGzdnmCSoZrac=.3ffd7d94-2eb4-44e3-b91b-967d58aa01f7@github.com> Message-ID: <2q-xho4lerOP-u38nkEG0T62NXtjQ8iM0b3AnVf_mPU=.df4c5282-cc36-4fd1-ab9c-f7fbc4208b95@github.com> On Tue, 22 Oct 2024 09:28:36 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? Previously it's https://github.com/openjdk/jdk/pull/18605. >> This pr is based on https://github.com/openjdk/jdk/pull/20781. >> >> Thanks! >> >> ## Test >> ### tests: >> * test/jdk/jdk/incubator/vector/ >> * test/hotspot/jtreg/compiler/vectorapi/ >> >> ### options: >> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:+EnableVectorSupport -XX:-UseVectorStubs >> >> ## Performance >> >> ### Tests >> jmh tests are test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ from vectorIntrinsics branch in panama-vector. It's good to have these tests in jdk main stream, I will do it in a separate pr later. (These tests are auto-generated tests from some script&template, it's good to also have those scrip&template in jdk main stream, but those scrip&template generates more other tests than what we need here, so better to add these tests and script&template in another pr). >> >> ### Options >> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' >> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' >> >> ### Performance data >> I have re-tested, there is no much difference from https://github.com/openjdk/jdk/pull/18605, so please check performance data in that pr. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > add comment for tanh Marked as reviewed by ihse (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21502#pullrequestreview-2409931030 From coleenp at openjdk.org Fri Nov 1 11:52:29 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 1 Nov 2024 11:52:29 GMT Subject: RFR: 8343206: Final graph reshaping should not compress abstract or interface class pointers [v6] In-Reply-To: References: Message-ID: <3erXIe3rpiqZ1E2ScCZU7JHkutKydhdaroKfGM_vlFQ=.dd8a6bf4-e030-4161-8a4b-78499936e985@github.com> On Fri, 1 Nov 2024 06:40:03 GMT, Tobias Hartmann wrote: >> @fisk found this problematic optimization in final graph reshaping where we would convert a `CmpP` into a `CmpN` by converting a constant class pointer operand to a narrow class pointer. After [JDK-8338526](https://bugs.openjdk.org/browse/JDK-8338526), this is not valid if the class pointer refers to an interface or abstract class. >> >> I think it's not an issue in current code though. The only way we can get a dynamic narrow class is when loading from an object at `oopDesc::klass_offset_in_bytes()`: >> https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/type.cpp#L3521-L3522 >> >> Comparisons of such loads with a constant class pointer of interface or abstract class type are always folded during GVN: >> https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/subnode.cpp#L1164-L1171 >> >> And therefore, the code in `Compile::final_graph_reshaping_main_switch` will never trigger. >> >> I added a corresponding assert and bailout in product to be on the safe side. The assert never triggered in my testing. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Now using the right method .. Yes, this looks great. I didn't realize you had a nice function for this in ci. Thank you! ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21784#pullrequestreview-2409933132 From dfenacci at openjdk.org Fri Nov 1 12:07:04 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 1 Nov 2024 12:07:04 GMT Subject: RFR: 8343153: compiler/codecache/CheckLargePages.java fails on linux with huge pages configured but its number set to 0 [v2] In-Reply-To: References: Message-ID: > # Issue > > The third test of `compiler/codecache/CheckLargePages.java` checks that non-segmented 1GB code cache can be allocated with 1GB large pages. > > On linux (the only supported platform) in order to allocate them, 1GB huge pages have to be enabled (checkable from `/proc/meminfo` and `/sys/kernel/mm/hugepages/hugepages-xxxx`) and their number has to be set to >0 (checkable from `/sys/kernel/mm/hugepages/hugepages-xxxx/nr_hugepages`). > > If 1GB huge pages are enabled but their number is 0, the test fails because it looks for a string that matches `CodeCache: min=1[gG] max=1[gG] base=[^ ]+ size=1[gG] page_size=1[gG]` but the actual output is `CodeCache: min=1G max=1G base=0x00007f4040000000 size=1G page_size=2M`. This happens because the VM tries to allocate 1GB huge pages but it fails beause the number of allocatable ones is 0 and the VM reverts to smaller large page sizes (2MB). > > # Solution > > The problem might be attributed to the VM only checking for 1GB huge pages to be supported, not how many there currently are. Nevertheless, this seems to be the correct behaviour, not least because their number can be changed dynamically. > So, the correct thing to do seems to be to "relax" the check made by the test to include both cases: > * when 1GB huge pages are supported and can be allocated correctly > * when 1GB huge pages are supported but cannot be allocated correctly (because there are none available) and the VM reverts to 2MB huge pages (if there are no 2MB pages available the test doesn't run at all). Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: - JDK-8343153: add missing import - JDK-8343153: check number of huge pages from file ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21757/files - new: https://git.openjdk.org/jdk/pull/21757/files/5cdd78dc..9670eef6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21757&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21757&range=00-01 Stats: 31 lines in 1 file changed: 17 ins; 11 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21757.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21757/head:pull/21757 PR: https://git.openjdk.org/jdk/pull/21757 From jbhateja at openjdk.org Fri Nov 1 12:11:01 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 1 Nov 2024 12:11:01 GMT Subject: RFR: 8343419: Assertion failure in long vector unsigned min/max with -XX:+UseKNLSetting Message-ID: KNL only supports AVX512F but not AVX512VL feature, thus vector operations with vector size less than or equal to 256 bits are generally emulated using AVX2 instructions. This bugfix patch covers the following scenarios for LongVector unsigned min/ max over KNL targets:- 1. Long species < 512 bits and non-predicated operation. - Operate at full vector width of 512 bits using VPMINUQ/VPMAXUQ instructions. 2. Long species < 512 bits with memory operands and non-predicated operations. - Load memory into exactly matching vector size. - Operate at full vector width of 512 bits 3. Long species < 512 bits and predicated operation. - Emulate operation using AVX2 instructions - Blend the result with the first source vector using the predication mask. - Existing opmask population mechanism expects the existence of AVX512BW/DQ features missing on KNL target. 4. Long species == 512 bits, both predicated and non-predicated operation - Directly use 512 bits VPMINUQ/VPMAXUQ instructions. All existing jtreg regressions are passing with -XX:+UseKNLSetting and -Xcomp flags. Kindly review. Best Regards, Jatin ------------- Commit messages: - 8343419: Assertion failure in long vector unsigned min/max with -XX:+UseKNLSetting Changes: https://git.openjdk.org/jdk/pull/21821/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21821&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343419 Stats: 37 lines in 2 files changed: 20 ins; 6 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/21821.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21821/head:pull/21821 PR: https://git.openjdk.org/jdk/pull/21821 From dfenacci at openjdk.org Fri Nov 1 12:14:06 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 1 Nov 2024 12:14:06 GMT Subject: RFR: 8343153: compiler/codecache/CheckLargePages.java fails on linux with huge pages configured but its number set to 0 [v3] In-Reply-To: References: Message-ID: > # Issue > > The third test of `compiler/codecache/CheckLargePages.java` checks that non-segmented 1GB code cache can be allocated with 1GB large pages. > > On linux (the only supported platform) in order to allocate them, 1GB huge pages have to be enabled (checkable from `/proc/meminfo` and `/sys/kernel/mm/hugepages/hugepages-xxxx`) and their number has to be set to >0 (checkable from `/sys/kernel/mm/hugepages/hugepages-xxxx/nr_hugepages`). > > If 1GB huge pages are enabled but their number is 0, the test fails because it looks for a string that matches `CodeCache: min=1[gG] max=1[gG] base=[^ ]+ size=1[gG] page_size=1[gG]` but the actual output is `CodeCache: min=1G max=1G base=0x00007f4040000000 size=1G page_size=2M`. This happens because the VM tries to allocate 1GB huge pages but it fails beause the number of allocatable ones is 0 and the VM reverts to smaller large page sizes (2MB). > > # Solution > > The problem might be attributed to the VM only checking for 1GB huge pages to be supported, not how many there currently are. Nevertheless, this seems to be the correct behaviour, not least because their number can be changed dynamically. > So, the correct thing to do seems to be to "relax" the check made by the test to include both cases: > * when 1GB huge pages are supported and can be allocated correctly > * when 1GB huge pages are supported but cannot be allocated correctly (because there are none available) and the VM reverts to 2MB huge pages (if there are no 2MB pages available the test doesn't run at all). Damon Fenacci has updated the pull request incrementally with three additional commits since the last revision: - JDK-8343153: add missing import - JDK-8343153: add missing brackets - JDK-8343153: add missing try-catch ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21757/files - new: https://git.openjdk.org/jdk/pull/21757/files/9670eef6..989ef945 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21757&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21757&range=01-02 Stats: 6 lines in 1 file changed: 2 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21757.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21757/head:pull/21757 PR: https://git.openjdk.org/jdk/pull/21757 From aph at openjdk.org Fri Nov 1 12:39:29 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 1 Nov 2024 12:39:29 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v4] In-Reply-To: <033UQ-960JvpQbpMGHPKyoIv43XWj2bGzdnmCSoZrac=.3ffd7d94-2eb4-44e3-b91b-967d58aa01f7@github.com> References: <033UQ-960JvpQbpMGHPKyoIv43XWj2bGzdnmCSoZrac=.3ffd7d94-2eb4-44e3-b91b-967d58aa01f7@github.com> Message-ID: <0UYENq3WDrMtFHJtLQzV8wo7SHVsgyAqKh7JPewdB7w=.5402fe2d-cd2b-49d4-8219-48d639fbaa16@github.com> On Tue, 22 Oct 2024 09:28:36 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? Previously it's https://github.com/openjdk/jdk/pull/18605. >> This pr is based on https://github.com/openjdk/jdk/pull/20781. >> >> Thanks! >> >> ## Test >> ### tests: >> * test/jdk/jdk/incubator/vector/ >> * test/hotspot/jtreg/compiler/vectorapi/ >> >> ### options: >> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:+EnableVectorSupport -XX:-UseVectorStubs >> >> ## Performance >> >> ### Tests >> jmh tests are test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ from vectorIntrinsics branch in panama-vector. It's good to have these tests in jdk main stream, I will do it in a separate pr later. (These tests are auto-generated tests from some script&template, it's good to also have those scrip&template in jdk main stream, but those scrip&template generates more other tests than what we need here, so better to add these tests and script&template in another pr). >> >> ### Options >> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' >> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' >> >> ### Performance data >> I have re-tested, there is no much difference from https://github.com/openjdk/jdk/pull/18605, so please check performance data in that pr. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > add comment for tanh Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21502#pullrequestreview-2409994503 From epeter at openjdk.org Fri Nov 1 12:54:13 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 1 Nov 2024 12:54:13 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v14] In-Reply-To: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: > **Background** > I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. > > **Details** > > The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. > > This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. > > More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! > > `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. > > **What this change enables** > > Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). > > Now we can do: > - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. > - Merging `Unsafe` stores to native memory. > - Merging `MemorySegment`: with array, native, ByteBuffer backing types. > - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. > > **Dealing with Overflows** > > We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. > > **Bench... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 92 commits: - Merge branch 'master' into JDK-8335392-MemPointer - apply more suggestions from Christian - Apply suggestions from code review Co-authored-by: Christian Hagedorn - fix distance assert - whitespace - more updates for Christian - Apply suggestions from code review Co-authored-by: Christian Hagedorn - Apply suggestions from code review Co-authored-by: Christian Hagedorn - Apply suggestions from code review Co-authored-by: Christian Hagedorn - Apply suggestions from code review Co-authored-by: Christian Hagedorn - ... and 82 more: https://git.openjdk.org/jdk/compare/f77a5144...e8ad2757 ------------- Changes: https://git.openjdk.org/jdk/pull/19970/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=13 Stats: 2682 lines in 16 files changed: 2415 ins; 213 del; 54 mod Patch: https://git.openjdk.org/jdk/pull/19970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19970/head:pull/19970 PR: https://git.openjdk.org/jdk/pull/19970 From duke at openjdk.org Fri Nov 1 12:59:32 2024 From: duke at openjdk.org (duke) Date: Fri, 1 Nov 2024 12:59:32 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v6] In-Reply-To: References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> Message-ID: On Thu, 31 Oct 2024 12:38:17 GMT, Tom?? Zezula wrote: >> [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. >> >> However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). >> >> This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. > > Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: > > Improved a comment in CompilerThread. @tzezula Your change (at version 7e0f1a4227f388dc8e22e6200dc026f056d26eed) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21285#issuecomment-2451829766 From dfenacci at openjdk.org Fri Nov 1 13:03:42 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 1 Nov 2024 13:03:42 GMT Subject: RFR: 8343153: compiler/codecache/CheckLargePages.java fails on linux with huge pages configured but its number set to 0 [v4] In-Reply-To: References: Message-ID: > # Issue > > The third test of `compiler/codecache/CheckLargePages.java` checks that non-segmented 1GB code cache can be allocated with 1GB large pages. > > On linux (the only supported platform) in order to allocate them, 1GB huge pages have to be enabled (checkable from `/proc/meminfo` and `/sys/kernel/mm/hugepages/hugepages-xxxx`) and their number has to be set to >0 (checkable from `/sys/kernel/mm/hugepages/hugepages-xxxx/nr_hugepages`). > > If 1GB huge pages are enabled but their number is 0, the test fails because it looks for a string that matches `CodeCache: min=1[gG] max=1[gG] base=[^ ]+ size=1[gG] page_size=1[gG]` but the actual output is `CodeCache: min=1G max=1G base=0x00007f4040000000 size=1G page_size=2M`. This happens because the VM tries to allocate 1GB huge pages but it fails beause the number of allocatable ones is 0 and the VM reverts to smaller large page sizes (2MB). > > # Solution > > The problem might be attributed to the VM only checking for 1GB huge pages to be supported, not how many there currently are. Nevertheless, this seems to be the correct behaviour, not least because their number can be changed dynamically. > So, the correct thing to do seems to be to "relax" the check made by the test to include both cases: > * when 1GB huge pages are supported and can be allocated correctly > * when 1GB huge pages are supported but cannot be allocated correctly (because there are none available) and the VM reverts to 2MB huge pages (if there are no 2MB pages available the test doesn't run at all). Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8343153: use >= 1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21757/files - new: https://git.openjdk.org/jdk/pull/21757/files/989ef945..70dfa263 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21757&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21757&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21757.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21757/head:pull/21757 PR: https://git.openjdk.org/jdk/pull/21757 From dfenacci at openjdk.org Fri Nov 1 13:08:29 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 1 Nov 2024 13:08:29 GMT Subject: RFR: 8343153: compiler/codecache/CheckLargePages.java fails on linux with huge pages configured but its number set to 0 In-Reply-To: <0tK_3KUqMNg0R5YdFFUlxsSeYZvF57UP_U0b6wdDhG8=.084bfe90-911f-4de2-aa5f-19ed208657b4@github.com> References: <0tK_3KUqMNg0R5YdFFUlxsSeYZvF57UP_U0b6wdDhG8=.084bfe90-911f-4de2-aa5f-19ed208657b4@github.com> Message-ID: On Thu, 31 Oct 2024 13:16:08 GMT, Evgeny Astigeevich wrote: >>> The problem might be attributed to the VM only checking for 1GB huge pages to be supported, not how many there currently are. >> >> https://bugs.openjdk.org/browse/JDK-8321526 > >> @eastig I noticed that you are the author of the original `testNonSegmented1GbCodeCacheWith1GbLargePages` test. Could I ask you to have a look at this change? Thanks a lot! > > `testDefaultCodeCacheWith1GbLargePages` and `testNonSegmented1GbCodeCacheWith1GbLargePages` should only be run if a system provides 1Gb pages. This is mentioned in their names: `...With1GbLargePages`. If there are no 1Gb pages available, the test should not be run. > > I suggest to check `/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages >= 1`. If not, output "Skipping testDefaultCodeCacheWith1GbLargePages and testDefaultCodeCacheWith1GbLargePages, no 1Gb pages available" . > > With your change, if a system provides 1Gb pages but JVM fails to use them because of a bug, the tests will pass and the bug will be unknown. Thanks for looking into it @eastig. I've changed the test to check for the content of `/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages >= 1` as you suggested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21757#issuecomment-2451841004 From eastigeevich at openjdk.org Fri Nov 1 13:12:33 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Fri, 1 Nov 2024 13:12:33 GMT Subject: RFR: 8343153: compiler/codecache/CheckLargePages.java fails on linux with huge pages configured but its number set to 0 [v4] In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 13:03:42 GMT, Damon Fenacci wrote: >> # Issue >> >> The third test of `compiler/codecache/CheckLargePages.java` checks that non-segmented 1GB code cache can be allocated with 1GB large pages. >> >> On linux (the only supported platform) in order to allocate them, 1GB huge pages have to be enabled (checkable from `/proc/meminfo` and `/sys/kernel/mm/hugepages/hugepages-xxxx`) and their number has to be set to >0 (checkable from `/sys/kernel/mm/hugepages/hugepages-xxxx/nr_hugepages`). >> >> If 1GB huge pages are enabled but their number is 0, the test fails because it looks for a string that matches `CodeCache: min=1[gG] max=1[gG] base=[^ ]+ size=1[gG] page_size=1[gG]` but the actual output is `CodeCache: min=1G max=1G base=0x00007f4040000000 size=1G page_size=2M`. This happens because the VM tries to allocate 1GB huge pages but it fails beause the number of allocatable ones is 0 and the VM reverts to smaller large page sizes (2MB). >> >> # Solution >> >> The problem might be attributed to the VM only checking for 1GB huge pages to be supported, not how many there currently are. Nevertheless, this seems to be the correct behaviour, not least because their number can be changed dynamically. >> So, the correct thing to do seems to be to "relax" the check made by the test to include both cases: >> * when 1GB huge pages are supported and can be allocated correctly >> * when 1GB huge pages are supported but cannot be allocated correctly (because there are none available) and the VM reverts to 2MB huge pages (if there are no 2MB pages available the test doesn't run at all). > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8343153: use >= 1 Looks good to me. ------------- Marked as reviewed by eastigeevich (Committer). PR Review: https://git.openjdk.org/jdk/pull/21757#pullrequestreview-2410039933 From dfenacci at openjdk.org Fri Nov 1 13:12:34 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 1 Nov 2024 13:12:34 GMT Subject: RFR: 8343153: compiler/codecache/CheckLargePages.java fails on linux with huge pages configured but its number set to 0 [v4] In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 07:40:27 GMT, Tobias Hartmann wrote: >> test/hotspot/jtreg/compiler/codecache/CheckLargePages.java line 120: >> >>> 118: // 1GB large pages configured but none available >>> 119: "Failed to reserve and commit memory with given page size\\. " + >>> 120: "req_addr: [^ ]+ size: 1[gG], page size: 1[gG], \\(errno = 12\\)"); >> >> Took me a while to figure that these are `OR` matches due to the `|` hiding at the end of the first line. Would it make sense to update the comment to something like this? >> >> // 1GB large pages configured and available >> "CodeCache:\\s+min=1[gG] max=1[gG] base=[^ ]+ size=1[gG] page_size=1[gG]|" + >> // or 1GB large pages configured but none available > > Also, isn't there a `CodeCache:\` line in the output in the failing case as well that should be added here in the OR part? Thanks @TobiHartmann for looking at it. I've actually changed the test to follow @eastig's suggestion below and reverted these lines to their original state. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21757#discussion_r1825800157 From chagedorn at openjdk.org Fri Nov 1 13:21:47 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 1 Nov 2024 13:21:47 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v13] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Fri, 1 Nov 2024 10:29:10 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > apply more suggestions from Christian src/hotspot/share/opto/mempointer.hpp line 343: > 341: // This shows that p1 and p2 have a distance greater than the array size, and hence at least one of the two > 342: // pointers must be out of bounds. This contradicts our assumption (S1) and we are done. > 343: // Maybe add some separation here since this comment does not belong to `TraceMemPointer` but is rather a file header comment. Suggestion: src/hotspot/share/opto/mempointer.hpp line 385: > 383: _distance(distance) > 384: { > 385: assert(_distance != min_jint, "given by condition S3 of MemPointer Lemma"); Suggestion: assert(_distance != min_jint, "given by condition (S3) of MemPointer Lemma"); src/hotspot/share/opto/mempointer.hpp line 389: > 387: > 388: public: > 389: MemPointerAliasing() : MemPointerAliasing(Unknown, 0) {} Does not look like you call this constructor directly. You can therefore make it private as well: Suggestion: MemPointerAliasing() : MemPointerAliasing(Unknown, 0) {} public: src/hotspot/share/opto/mempointer.hpp line 393: > 391: static MemPointerAliasing make_unknown() { > 392: return MemPointerAliasing(); > 393: } Thinking about the comment above again, you can probably just remove the no-arg-constructor and simply do the following which I think is expressive enough: Suggestion: static MemPointerAliasing make_unknown() { return MemPointerAliasing(Unknown, 0); } src/hotspot/share/opto/mempointer.hpp line 400: > 398: > 399: // Use case: exact aliasing and adjacency. > 400: bool is_always_at_distance(const jint distance) const { The "always" seems to refer to the `Always` but it reads like we are just curious about the distance. Is `is_always_and_at_distance()` more clear? src/hotspot/share/opto/mempointer.hpp line 429: > 427: _variable(nullptr), > 428: _scale(NoOverflowInt::make_NaN()) {} > 429: MemPointerSummand(Node* variable, const NoOverflowInt scale) : Can `scale` be passed as const reference? You will make a copy anyway when assigning it to `_scale`. The compiler would probably optimize this anyway but I guess it does not hurt to use a reference here directly. src/hotspot/share/opto/mempointer.hpp line 438: > 436: > 437: Node* variable() const { return _variable; } > 438: NoOverflowInt scale() const { return _scale; } Not sure if you really require to create a new object here or if you could just pass it by const reference. The usages are only in `parse_decomposed_form()`. There you either add it together, from which you create a new `NoOverFlowInt` anyway, or you use it to create a new `MemPointerSummand` which will create it's own `scale` copy anyway. But maybe I'm also missing something here. src/hotspot/share/opto/mempointer.hpp line 480: > 478: // We limit the number of summands to 10. Usually, a pointer contains a base pointer > 479: // (e.g. array pointer or null for native memory) and a few variables. > 480: static const int SUMMANDS_SIZE = 10; Looks like a best guess. Maybe you can also explicitly mention that here. Otherwise, it's unclear how you came up with the value 10. src/hotspot/share/opto/mempointer.hpp line 497: > 495: > 496: private: > 497: MemPointerDecomposedForm(Node* pointer, const GrowableArray& summands, const NoOverflowInt con) Same here, could `con` be passed by const reference since you create a copy from it anyway? src/hotspot/share/opto/mempointer.hpp line 498: > 496: private: > 497: MemPointerDecomposedForm(Node* pointer, const GrowableArray& summands, const NoOverflowInt con) > 498: :_pointer(pointer), _con(con) { Suggestion: : _pointer(pointer), _con(con) { src/hotspot/share/opto/noOverflowInt.hpp line 28: > 26: #define SHARE_OPTO_NOOVERFLOWINT_HPP > 27: > 28: #include "utilities/globalDefinitions.hpp" You do not seem to need this and thus could be removed Suggestion: src/hotspot/share/opto/noOverflowInt.hpp line 57: > 55: bool is_zero() const { return !is_NaN() && value() == 0; } > 56: > 57: friend NoOverflowInt operator+(const NoOverflowInt a, const NoOverflowInt b) { Is it required to pass the arguments by value for the overloaded operators or would it be sufficient to pass them by reference (i.e. `const NoOverflowInt& a, const NoOverflowInt& b`)? src/hotspot/share/opto/noOverflowInt.hpp line 90: > 88: > 89: NoOverflowInt abs() const { > 90: if (is_NaN()) { return make_NaN(); } Why do you require a new `NaN` here and not simply return `*this`? src/hotspot/share/opto/noOverflowInt.hpp line 95: > 93: } > 94: > 95: bool is_multiple_of(const NoOverflowInt other) const { I think you can also pass `other` here by reference since you only query it: Suggestion: bool is_multiple_of(const NoOverflowInt& other) const { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825759524 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825760604 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825763396 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825765294 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825768349 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825774196 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825778656 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825779509 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825781796 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825781191 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825758486 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825746670 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825749382 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825747742 From duke at openjdk.org Fri Nov 1 13:39:36 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Fri, 1 Nov 2024 13:39:36 GMT Subject: Integrated: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread In-Reply-To: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> Message-ID: On Tue, 1 Oct 2024 10:57:58 GMT, Tom?? Zezula wrote: > [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. > > However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). > > This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. This pull request has now been integrated. Changeset: 751a914b Author: Tomas Zezula URL: https://git.openjdk.org/jdk/commit/751a914b0a377d4e1dd30d2501f0ab4e327dea34 Stats: 124 lines in 6 files changed: 108 ins; 4 del; 12 mod 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread Reviewed-by: dnsimon, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21285 From epeter at openjdk.org Fri Nov 1 13:56:53 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 1 Nov 2024 13:56:53 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v15] In-Reply-To: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: > **Background** > I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. > > **Details** > > The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. > > This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. > > More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! > > `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. > > **What this change enables** > > Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). > > Now we can do: > - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. > - Merging `Unsafe` stores to native memory. > - Merging `MemorySegment`: with array, native, ByteBuffer backing types. > - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. > > **Dealing with Overflows** > > We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. > > **Bench... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19970/files - new: https://git.openjdk.org/jdk/pull/19970/files/e8ad2757..e2550c9b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=13-14 Stats: 6 lines in 2 files changed: 1 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19970/head:pull/19970 PR: https://git.openjdk.org/jdk/pull/19970 From epeter at openjdk.org Fri Nov 1 13:56:53 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 1 Nov 2024 13:56:53 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v13] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Fri, 1 Nov 2024 12:31:50 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> apply more suggestions from Christian > > src/hotspot/share/opto/mempointer.hpp line 400: > >> 398: >> 399: // Use case: exact aliasing and adjacency. >> 400: bool is_always_at_distance(const jint distance) const { > > The "always" seems to refer to the `Always` but it reads like we are just curious about the distance. Is `is_always_and_at_distance()` more clear? Hmm. Maybe I can call it `is_always_with_distance`? Because this would imply that the two pointers always have an aliasing with this exact distance.... so that would be fitting in its **meaning**. But yours is more exactly what id **does**.... hmm.. what do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825844577 From epeter at openjdk.org Fri Nov 1 14:38:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 1 Nov 2024 14:38:36 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v13] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: <7i07uECc-y3b1y4bxbl8OvxmYxgvj0VUnonJNbU22RY=.7bdcb41c-abe4-40fc-a83d-19c4966de4d9@github.com> On Fri, 1 Nov 2024 12:25:43 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> apply more suggestions from Christian > > src/hotspot/share/opto/mempointer.hpp line 389: > >> 387: >> 388: public: >> 389: MemPointerAliasing() : MemPointerAliasing(Unknown, 0) {} > > Does not look like you call this constructor directly. You can therefore make it private as well: > Suggestion: > > MemPointerAliasing() : MemPointerAliasing(Unknown, 0) {} > > public: I just removed this constructor! > src/hotspot/share/opto/mempointer.hpp line 393: > >> 391: static MemPointerAliasing make_unknown() { >> 392: return MemPointerAliasing(); >> 393: } > > Thinking about the comment above again, you can probably just remove the no-arg-constructor and simply do the following which I think is expressive enough: > Suggestion: > > static MemPointerAliasing make_unknown() { > return MemPointerAliasing(Unknown, 0); > } Yes, this seems better, I'm doing this :) > src/hotspot/share/opto/mempointer.hpp line 429: > >> 427: _variable(nullptr), >> 428: _scale(NoOverflowInt::make_NaN()) {} >> 429: MemPointerSummand(Node* variable, const NoOverflowInt scale) : > > Can `scale` be passed as const reference? You will make a copy anyway when assigning it to `_scale`. The compiler would probably optimize this anyway but I guess it does not hurt to use a reference here directly. Will do that, and similarly elsewhere! > src/hotspot/share/opto/mempointer.hpp line 438: > >> 436: >> 437: Node* variable() const { return _variable; } >> 438: NoOverflowInt scale() const { return _scale; } > > Not sure if you really require to create a new object here or if you could just pass it by const reference. The usages are only in `parse_decomposed_form()`. There you either add it together, from which you create a new `NoOverFlowInt` anyway, or you use it to create a new `MemPointerSummand` which will create it's own `scale` copy anyway. But maybe I'm also missing something here. Passing out constant references makes me a little nervous, honestly. What if the MemPointer does not outlive the use of the reference outside? I think a creation of a `NoOverflowInt` is very very cheap, and not really worth that risk... > src/hotspot/share/opto/noOverflowInt.hpp line 57: > >> 55: bool is_zero() const { return !is_NaN() && value() == 0; } >> 56: >> 57: friend NoOverflowInt operator+(const NoOverflowInt a, const NoOverflowInt b) { > > Is it required to pass the arguments by value for the overloaded operators or would it be sufficient to pass them by reference (i.e. `const NoOverflowInt& a, const NoOverflowInt& b`)? Good idea! > src/hotspot/share/opto/noOverflowInt.hpp line 90: > >> 88: >> 89: NoOverflowInt abs() const { >> 90: if (is_NaN()) { return make_NaN(); } > > Why do you require a new `NaN` here and not simply return `*this`? Yes, I changed it! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825890781 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825891236 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825892269 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825894476 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825889312 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825890329 From dnsimon at openjdk.org Fri Nov 1 14:40:57 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 1 Nov 2024 14:40:57 GMT Subject: RFR: 8343439: [JVMCI] Fix javadoc of Services.getSavedProperties Message-ID: The javadoc of `jdk.vm.ci.services.Services.getSavedProperties` is currently: /** * Gets an unmodifiable copy of the system properties parsed by {@code arguments.cpp} * plus {@code java.specification.version}, {@code os.name} and {@code os.arch}. * The latter two are forced to be the real OS and architecture. That is, values * for these two properties set on the command line are ignored. */ The details about how the copy is initialized are specific to the HotSpot VM. On SVM, the semantics can be different. This PR separates out the HotSpot specific part. ------------- Commit messages: - separate out HotSpot specific semantics of getSavedProperties Changes: https://git.openjdk.org/jdk/pull/21832/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21832&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343439 Stats: 6 lines in 1 file changed: 2 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21832.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21832/head:pull/21832 PR: https://git.openjdk.org/jdk/pull/21832 From epeter at openjdk.org Fri Nov 1 14:42:37 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 1 Nov 2024 14:42:37 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v13] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: <9MSWBGyO2BisLPYCBiz2clMeEPgA4f3lUrUNHjJ41Tg=.f8adee1b-266c-4611-86cb-5e18287ec820@github.com> On Fri, 1 Nov 2024 12:45:56 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> apply more suggestions from Christian > > src/hotspot/share/opto/mempointer.hpp line 480: > >> 478: // We limit the number of summands to 10. Usually, a pointer contains a base pointer >> 479: // (e.g. array pointer or null for native memory) and a few variables. >> 480: static const int SUMMANDS_SIZE = 10; > > Looks like a best guess. Maybe you can also explicitly mention that here. Otherwise, it's unclear how you came up with the value 10. Ok, will do > src/hotspot/share/opto/mempointer.hpp line 497: > >> 495: >> 496: private: >> 497: MemPointerDecomposedForm(Node* pointer, const GrowableArray& summands, const NoOverflowInt con) > > Same here, could `con` be passed by const reference since you create a copy from it anyway? did that ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825897645 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825898587 From epeter at openjdk.org Fri Nov 1 14:49:07 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 1 Nov 2024 14:49:07 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v16] In-Reply-To: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: > **Background** > I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. > > **Details** > > The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. > > This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. > > More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! > > `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. > > **What this change enables** > > Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). > > Now we can do: > - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. > - Merging `Unsafe` stores to native memory. > - Merging `MemorySegment`: with array, native, ByteBuffer backing types. > - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. > > **Dealing with Overflows** > > We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. > > **Bench... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: more review applications ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19970/files - new: https://git.openjdk.org/jdk/pull/19970/files/e2550c9b..d10b76ff Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=14-15 Stats: 20 lines in 3 files changed: 2 ins; 2 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/19970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19970/head:pull/19970 PR: https://git.openjdk.org/jdk/pull/19970 From epeter at openjdk.org Fri Nov 1 14:49:08 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 1 Nov 2024 14:49:08 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v13] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Fri, 1 Nov 2024 13:52:48 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/mempointer.hpp line 400: >> >>> 398: >>> 399: // Use case: exact aliasing and adjacency. >>> 400: bool is_always_at_distance(const jint distance) const { >> >> The "always" seems to refer to the `Always` but it reads like we are just curious about the distance. Is `is_always_and_at_distance()` more clear? > > Hmm. Maybe I can call it `is_always_with_distance`? Because this would imply that the two pointers always have an aliasing with this exact distance.... so that would be fitting in its **meaning**. But yours is more exactly what id **does**.... hmm.. what do you think? Hmm. No I think I really like the original, because it reads like this: `aliasing.is_always_at_distance(d)`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825905560 From thartmann at openjdk.org Fri Nov 1 15:12:30 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 1 Nov 2024 15:12:30 GMT Subject: RFR: 8343153: compiler/codecache/CheckLargePages.java fails on linux with huge pages configured but its number set to 0 [v4] In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 13:03:42 GMT, Damon Fenacci wrote: >> # Issue >> >> The third test of `compiler/codecache/CheckLargePages.java` checks that non-segmented 1GB code cache can be allocated with 1GB large pages. >> >> On linux (the only supported platform) in order to allocate them, 1GB huge pages have to be enabled (checkable from `/proc/meminfo` and `/sys/kernel/mm/hugepages/hugepages-xxxx`) and their number has to be set to >0 (checkable from `/sys/kernel/mm/hugepages/hugepages-xxxx/nr_hugepages`). >> >> If 1GB huge pages are enabled but their number is 0, the test fails because it looks for a string that matches `CodeCache: min=1[gG] max=1[gG] base=[^ ]+ size=1[gG] page_size=1[gG]` but the actual output is `CodeCache: min=1G max=1G base=0x00007f4040000000 size=1G page_size=2M`. This happens because the VM tries to allocate 1GB huge pages but it fails beause the number of allocatable ones is 0 and the VM reverts to smaller large page sizes (2MB). >> >> # Solution >> >> The problem might be attributed to the VM only checking for 1GB huge pages to be supported, not how many there currently are. Nevertheless, this seems to be the correct behaviour, not least because their number can be changed dynamically. >> So, the correct thing to do seems to be to "relax" the check made by the test to include both cases: >> * when 1GB huge pages are supported and can be allocated correctly >> * when 1GB huge pages are supported but cannot be allocated correctly (because there are none available) and the VM reverts to 2MB huge pages (if there are no 2MB pages available the test doesn't run at all). > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8343153: use >= 1 Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21757#pullrequestreview-2410261375 From never at openjdk.org Fri Nov 1 17:03:27 2024 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 1 Nov 2024 17:03:27 GMT Subject: RFR: 8343439: [JVMCI] Fix javadoc of Services.getSavedProperties In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 14:36:01 GMT, Doug Simon wrote: > The javadoc of `jdk.vm.ci.services.Services.getSavedProperties` is currently: > > /** > * Gets an unmodifiable copy of the system properties parsed by {@code arguments.cpp} > * plus {@code java.specification.version}, {@code os.name} and {@code os.arch}. > * The latter two are forced to be the real OS and architecture. That is, values > * for these two properties set on the command line are ignored. > */ > > The details about how the copy is initialized are specific to the HotSpot VM. On SVM, the semantics can be different. This PR separates out the HotSpot specific part. Marked as reviewed by never (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21832#pullrequestreview-2410497098 From dnsimon at openjdk.org Fri Nov 1 17:07:31 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 1 Nov 2024 17:07:31 GMT Subject: RFR: 8343439: [JVMCI] Fix javadoc of Services.getSavedProperties In-Reply-To: References: Message-ID: <3ysaTjj1gA2FAlTBZ74Z3NREDdsOrkjCoxiJMA8Tzmk=.313ae18d-565e-41a0-83f4-7df3a2c1746b@github.com> On Fri, 1 Nov 2024 14:36:01 GMT, Doug Simon wrote: > The javadoc of `jdk.vm.ci.services.Services.getSavedProperties` is currently: > > /** > * Gets an unmodifiable copy of the system properties parsed by {@code arguments.cpp} > * plus {@code java.specification.version}, {@code os.name} and {@code os.arch}. > * The latter two are forced to be the real OS and architecture. That is, values > * for these two properties set on the command line are ignored. > */ > > The details about how the copy is initialized are specific to the HotSpot VM. On SVM, the semantics can be different. This PR separates out the HotSpot specific part. Thanks for the review Tom. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21832#issuecomment-2452244006 From dnsimon at openjdk.org Fri Nov 1 17:07:32 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 1 Nov 2024 17:07:32 GMT Subject: Integrated: 8343439: [JVMCI] Fix javadoc of Services.getSavedProperties In-Reply-To: References: Message-ID: <19Lx0iaxn_59ty9sWRMKM7ftO8MX-ZHlbfr33jARKQY=.64cdeefd-d155-4b5d-9aeb-4abd6a0de49a@github.com> On Fri, 1 Nov 2024 14:36:01 GMT, Doug Simon wrote: > The javadoc of `jdk.vm.ci.services.Services.getSavedProperties` is currently: > > /** > * Gets an unmodifiable copy of the system properties parsed by {@code arguments.cpp} > * plus {@code java.specification.version}, {@code os.name} and {@code os.arch}. > * The latter two are forced to be the real OS and architecture. That is, values > * for these two properties set on the command line are ignored. > */ > > The details about how the copy is initialized are specific to the HotSpot VM. On SVM, the semantics can be different. This PR separates out the HotSpot specific part. This pull request has now been integrated. Changeset: 1eccdfc6 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/1eccdfc62288b8baff950b7293ee931eab896298 Stats: 6 lines in 1 file changed: 2 ins; 0 del; 4 mod 8343439: [JVMCI] Fix javadoc of Services.getSavedProperties Reviewed-by: never ------------- PR: https://git.openjdk.org/jdk/pull/21832 From kvn at openjdk.org Fri Nov 1 17:30:29 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 1 Nov 2024 17:30:29 GMT Subject: RFR: 8343205: CompileBroker::possibly_add_compiler_threads excessively polls available memory [v2] In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 00:22:42 GMT, Martin Doerr wrote: >> This PR adds a quick check + bail out in order to avoid excessive usage of slow checks. Especially, it avoids querying the available memory so often. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Minor improvements (review feedback). Looks good. src/hotspot/share/compiler/compileBroker.hpp line 90: > 88: CompileTask* _first_stale; > 89: > 90: volatile int _size; Right. I was concern about concurrent access to this field. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21812#pullrequestreview-2410562706 PR Review Comment: https://git.openjdk.org/jdk/pull/21812#discussion_r1826100677 From kvn at openjdk.org Fri Nov 1 17:30:30 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 1 Nov 2024 17:30:30 GMT Subject: RFR: 8343205: CompileBroker::possibly_add_compiler_threads excessively polls available memory [v2] In-Reply-To: References: Message-ID: <9yYTrS8f-0f1Bi_YUaCFEb3JhwLEhxk1mX8_3nvIv98=.d152b2e5-aae5-4d95-803d-91995ca3367e@github.com> On Fri, 1 Nov 2024 00:24:47 GMT, Martin Doerr wrote: >> src/hotspot/share/compiler/compileBroker.cpp line 1027: >> >>> 1025: >>> 1026: int old_c2_count = 0, new_c2_count = 0, old_c1_count = 0, new_c1_count = 0; >>> 1027: const int c2_tasks_per_thread = 2, c1_tasks_per_thread = 4; >> >> Any reason to have such numbers (2 and 4)? Any experiments were done to select the best numbers? > > Please note that these constants are not new. I have only given them names. I had done some experiments when implementing [JDK-8198756](https://bugs.openjdk.org/browse/JDK-8198756) for JDK11. C1 is faster than C2. Therefore, we can have more C1 tasks per C1 thread. Good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21812#discussion_r1826098757 From sviswanathan at openjdk.org Sat Nov 2 00:10:27 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Sat, 2 Nov 2024 00:10:27 GMT Subject: RFR: 8343419: Assertion failure in long vector unsigned min/max with -XX:+UseKNLSetting In-Reply-To: References: Message-ID: <3Mk5nm6pbSQzZECZnGXurTdkTDHOe99zGcwfRgr-ec0=.66858eec-9c11-4e67-b5d4-cb683816f231@github.com> On Fri, 1 Nov 2024 12:06:27 GMT, Jatin Bhateja wrote: > KNL only supports AVX512F but not AVX512VL feature, thus vector operations with vector size less than or equal to 256 bits are generally emulated using AVX2 instructions. > > This bugfix patch covers the following scenarios for LongVector unsigned min/ max over KNL targets:- > 1. Long species < 512 bits and non-predicated operation. > - Operate at full vector width of 512 bits using VPMINUQ/VPMAXUQ instructions. > 2. Long species < 512 bits with memory operands and non-predicated operations. > - Load memory into exactly matching vector size. > - Operate at full vector width of 512 bits > 3. Long species < 512 bits and predicated operation. > - Emulate operation using AVX2 instructions > - Blend the result with the first source vector using the predication mask. > - Existing opmask population mechanism expects the existence of AVX512BW/DQ features missing on KNL target. > 4. Long species == 512 bits, both predicated and non-predicated operations. > - Directly uses 512 bits VPMINUQ/VPMAXUQ instructions. > > All existing jtreg regressions are passing with -XX:+UseKNLSetting and -Xcomp flags. > > Kindly review. > > Best Regards, > Jatin Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21821#pullrequestreview-2411015230 From jbhateja at openjdk.org Sat Nov 2 01:10:47 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 2 Nov 2024 01:10:47 GMT Subject: RFR: 8343419: Assertion failure in long vector unsigned min/max with -XX:+UseKNLSetting In-Reply-To: <3Mk5nm6pbSQzZECZnGXurTdkTDHOe99zGcwfRgr-ec0=.66858eec-9c11-4e67-b5d4-cb683816f231@github.com> References: <3Mk5nm6pbSQzZECZnGXurTdkTDHOe99zGcwfRgr-ec0=.66858eec-9c11-4e67-b5d4-cb683816f231@github.com> Message-ID: On Sat, 2 Nov 2024 00:08:21 GMT, Sandhya Viswanathan wrote: >> KNL only supports AVX512F but not AVX512VL feature, thus vector operations with vector size less than or equal to 256 bits are generally emulated using AVX2 instructions. >> >> This bugfix patch covers the following scenarios for LongVector unsigned min/ max over KNL targets:- >> 1. Long species < 512 bits and non-predicated operation. >> - Operate at full vector width of 512 bits using VPMINUQ/VPMAXUQ instructions. >> 2. Long species < 512 bits with memory operands and non-predicated operations. >> - Load memory into exactly matching vector size. >> - Operate at full vector width of 512 bits >> 3. Long species < 512 bits and predicated operation. >> - Emulate operation using AVX2 instructions >> - Blend the result with the first source vector using the predication mask. >> - Existing opmask population mechanism expects the existence of AVX512BW/DQ features missing on KNL target. >> 4. Long species == 512 bits, both predicated and non-predicated operations. >> - Directly uses 512 bits VPMINUQ/VPMAXUQ instructions. >> >> All existing jtreg regressions are passing with -XX:+UseKNLSetting and -Xcomp flags. >> >> Kindly review. >> >> Best Regards, >> Jatin > > Looks good to me. Thanks @sviswa7 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21821#issuecomment-2452777794 From jbhateja at openjdk.org Sat Nov 2 01:10:47 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 2 Nov 2024 01:10:47 GMT Subject: Integrated: 8343419: Assertion failure in long vector unsigned min/max with -XX:+UseKNLSetting In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 12:06:27 GMT, Jatin Bhateja wrote: > KNL only supports AVX512F but not AVX512VL feature, thus vector operations with vector size less than or equal to 256 bits are generally emulated using AVX2 instructions. > > This bugfix patch covers the following scenarios for LongVector unsigned min/ max over KNL targets:- > 1. Long species < 512 bits and non-predicated operation. > - Operate at full vector width of 512 bits using VPMINUQ/VPMAXUQ instructions. > 2. Long species < 512 bits with memory operands and non-predicated operations. > - Load memory into exactly matching vector size. > - Operate at full vector width of 512 bits > 3. Long species < 512 bits and predicated operation. > - Emulate operation using AVX2 instructions > - Blend the result with the first source vector using the predication mask. > - Existing opmask population mechanism expects the existence of AVX512BW/DQ features missing on KNL target. > 4. Long species == 512 bits, both predicated and non-predicated operations. > - Directly uses 512 bits VPMINUQ/VPMAXUQ instructions. > > All existing jtreg regressions are passing with -XX:+UseKNLSetting and -Xcomp flags. > > Kindly review. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: 3c7082a6 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/3c7082a633037c19066c36be2520487b0bed4e79 Stats: 37 lines in 2 files changed: 20 ins; 6 del; 11 mod 8343419: Assertion failure in long vector unsigned min/max with -XX:+UseKNLSetting Reviewed-by: sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/21821 From syan at openjdk.org Sat Nov 2 13:37:02 2024 From: syan at openjdk.org (SendaoYan) Date: Sat, 2 Nov 2024 13:37:02 GMT Subject: RFR: 8343475: RISC-V: Test TestAESIntrinsicsOnUnsupportedConfig.java fails Message-ID: Hi all, Test `test/hotspot/jtreg/compiler/cpuflags/TestAESIntrinsicsOnUnsupportedConfig.java` fails on linux-riscv64, the expected output is: `warning: AES instructions are not available on this CPU` or: `warning: AES intrinsics are not available on this CPU` But the actual output on linux-riscv64 both is: `warning: AES intrinsics require Zvkn extension (not available on this CPU).` This PR adopt the output for linux-riscv64. The change has been verified locally, test-fix only, no risk. ------------- Commit messages: - 8343475: RISC-V: Test TestAESIntrinsicsOnUnsupportedConfig.java fails Changes: https://git.openjdk.org/jdk/pull/21849/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21849&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343475 Stats: 4 lines in 1 file changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21849.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21849/head:pull/21849 PR: https://git.openjdk.org/jdk/pull/21849 From syan at openjdk.org Sat Nov 2 14:28:32 2024 From: syan at openjdk.org (SendaoYan) Date: Sat, 2 Nov 2024 14:28:32 GMT Subject: RFR: 8343475: RISC-V: Test TestAESIntrinsicsOnUnsupportedConfig.java fails In-Reply-To: References: Message-ID: On Sat, 2 Nov 2024 13:31:35 GMT, SendaoYan wrote: > Hi all, > Test `test/hotspot/jtreg/compiler/cpuflags/TestAESIntrinsicsOnUnsupportedConfig.java` fails on linux-riscv64, the expected output is: > `warning: AES instructions are not available on this CPU` > or: > `warning: AES intrinsics are not available on this CPU` > But the actual output on linux-riscv64 both is: > `warning: AES intrinsics require Zvkn extension (not available on this CPU).` > > This PR adopt the output for linux-riscv64. The change has been verified locally, test-fix only, no risk. Duplicate to https://github.com/openjdk/jdk/pull/21847, close this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21849#issuecomment-2453007058 From syan at openjdk.org Sat Nov 2 14:28:32 2024 From: syan at openjdk.org (SendaoYan) Date: Sat, 2 Nov 2024 14:28:32 GMT Subject: Withdrawn: 8343475: RISC-V: Test TestAESIntrinsicsOnUnsupportedConfig.java fails In-Reply-To: References: Message-ID: <7wsRnK2bL7Tae0F_D1jJXTEGbMFVlCp3mVK-UG421OY=.658ed2f7-9c72-4a6a-8fd0-02a940b541e7@github.com> On Sat, 2 Nov 2024 13:31:35 GMT, SendaoYan wrote: > Hi all, > Test `test/hotspot/jtreg/compiler/cpuflags/TestAESIntrinsicsOnUnsupportedConfig.java` fails on linux-riscv64, the expected output is: > `warning: AES instructions are not available on this CPU` > or: > `warning: AES intrinsics are not available on this CPU` > But the actual output on linux-riscv64 both is: > `warning: AES intrinsics require Zvkn extension (not available on this CPU).` > > This PR adopt the output for linux-riscv64. The change has been verified locally, test-fix only, no risk. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/21849 From acobbs at openjdk.org Sat Nov 2 15:55:57 2024 From: acobbs at openjdk.org (Archie Cobbs) Date: Sat, 2 Nov 2024 15:55:57 GMT Subject: RFR: 8343479: Remove unnecessary @SuppressWarnings annotations (hotspot) Message-ID: Please review this patch which removes unnecessary `@SuppressWarnings` annotations. ------------- Commit messages: - Merge branch 'master' into SuppressWarningsCleanup-graal - Remove unnecessary @SuppressWarnings annotations. Changes: https://git.openjdk.org/jdk/pull/21853/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21853&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343479 Stats: 6 lines in 3 files changed: 0 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21853.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21853/head:pull/21853 PR: https://git.openjdk.org/jdk/pull/21853 From acobbs at openjdk.org Sun Nov 3 03:10:24 2024 From: acobbs at openjdk.org (Archie Cobbs) Date: Sun, 3 Nov 2024 03:10:24 GMT Subject: RFR: 8343479: Remove unnecessary @SuppressWarnings annotations (hotspot) [v2] In-Reply-To: References: Message-ID: > Please review this patch which removes unnecessary `@SuppressWarnings` annotations. Archie Cobbs has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Update copyright years. - Merge branch 'master' into SuppressWarningsCleanup-hotspot - Merge branch 'master' into SuppressWarningsCleanup-graal - Remove unnecessary @SuppressWarnings annotations. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21853/files - new: https://git.openjdk.org/jdk/pull/21853/files/8eab41ca..21c83e93 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21853&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21853&range=00-01 Stats: 592 lines in 18 files changed: 420 ins; 93 del; 79 mod Patch: https://git.openjdk.org/jdk/pull/21853.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21853/head:pull/21853 PR: https://git.openjdk.org/jdk/pull/21853 From sparasa at openjdk.org Mon Nov 4 01:53:26 2024 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 4 Nov 2024 01:53:26 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support [v4] In-Reply-To: References: Message-ID: > The goal of this PR is to fix instruction encoding errors for some of the instructions which support the APX New Data Destination (NDD) and No Flags (NF) features (added in [JDK-8329035](https://bugs.openjdk.org/browse/JDK-8329035)) > > The correctness of the encoding is verified by comparing to the reference encoding by GCC using an automated tool (https://github.com/openjdk/jdk/pull/21795) Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: updated opcode 0F_3C to MAP4 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21770/files - new: https://git.openjdk.org/jdk/pull/21770/files/5049d3aa..0f404dbd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21770&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21770&range=02-03 Stats: 24 lines in 2 files changed: 1 ins; 2 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/21770.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21770/head:pull/21770 PR: https://git.openjdk.org/jdk/pull/21770 From sparasa at openjdk.org Mon Nov 4 01:59:34 2024 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 4 Nov 2024 01:59:34 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support [v4] In-Reply-To: References: <9lYvrMFQKW4Cq9zKFDjkNg9zKgLvdaCtVQTZgUGgpNs=.78ee059a-5f85-4d6e-828a-a71351d44763@github.com> Message-ID: On Wed, 30 Oct 2024 21:55:11 GMT, Srinivas Vamsi Parasa wrote: > I think we should first check-in extended gtest asm validation script detecting these issues either before or along with this patch. Once this PR is integrated, the immediate next step is to integrate Hank's extended verification tool https://github.com/openjdk/jdk/pull/21795. Those tests won't pass without the changes in this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21770#issuecomment-2453699811 From jkarthikeyan at openjdk.org Mon Nov 4 03:36:12 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 4 Nov 2024 03:36:12 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v4] In-Reply-To: References: Message-ID: > Hi all, > This patch adds a new pass to consolidate lowering of complex backend-specific code patterns, such as `MacroLogicV` and the optimization proposed by #21244. Moving these optimizations to backend code can simplify shared code, while also making it easier to develop more in-depth optimizations. The linked bug has an example of a new optimization this could enable. The new phase does GVN to de-duplicate nodes and calls nodes' `Value()` method, but it does not call `Identity()` or `Ideal()` to avoid undoing any changes done during lowering. It also reuses the IGVN worklist to avoid needing to re-create the notification mechanism. > > In this PR only the skeleton code for the pass is added, moving `MacroLogicV` to this system will be done separately in a future patch. Tier 1 tests pass on my linux x64 machine. Feedback on this patch would be greatly appreciated! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Re-use optimize() and add backend-specific should_lower() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21599/files - new: https://git.openjdk.org/jdk/pull/21599/files/c7ceec71..fc8fa245 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21599&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21599&range=02-03 Stats: 49 lines in 8 files changed: 36 ins; 11 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21599.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21599/head:pull/21599 PR: https://git.openjdk.org/jdk/pull/21599 From jkarthikeyan at openjdk.org Mon Nov 4 03:36:12 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 4 Nov 2024 03:36:12 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v3] In-Reply-To: <3kQ-4gSCJWVed41_y2EvHcqxX1tDLYSTGeBL_QTfPn8=.55f7ce6e-e209-465a-97af-257770e13a65@github.com> References: <6ABTGpRWisFfAgR9R6gCqxJMasj8pEYnMRsXCIes9Tc=.b3495a73-aacc-4b7e-9f3a-1e0428cc539a@github.com> <3kQ-4gSCJWVed41_y2EvHcqxX1tDLYSTGeBL_QTfPn8=.55f7ce6e-e209-465a-97af-257770e13a65@github.com> Message-ID: On Thu, 31 Oct 2024 02:48:03 GMT, Quan Anh Mai wrote: >> I would prefer to keep it as-is because `PhaseIterGVN::optimize` does a lot of logic that may not be relevant here (such as IGVN verification and IGV printing). This way we can avoid changes to IGVN in the future accidentally impacting lowering in unexpected ways. > > I actually think it is a good idea to have verification and printing. Since Lowering does IGVN-like transformations, they should behave in generally the same way. If it turns out that we actually need a separate entry then we can create it then. I see, I can understand the benefit there. I think we'll still need to have a custom entry to collect the nodes to place on the worklist and filter based on platform, but we can use `optimize()` to replace the main loop. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1827154474 From amitkumar at openjdk.org Mon Nov 4 03:38:38 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 4 Nov 2024 03:38:38 GMT Subject: RFR: 8341068: [s390x] intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v3] In-Reply-To: References: Message-ID: <-4CHTIyNpEsG27Y57OXnNbJqKuFf-BMxfV45xU6QtCw=.3b6a45f6-a288-40bd-a10d-e91f2c1d85d7@github.com> On Mon, 21 Oct 2024 07:45:27 GMT, Lutz Schmidt wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> removes extra whitespaces > > Still looks good. Thanks @RealLucy @theRealAph for the suggestions & approval. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21559#issuecomment-2453768936 From amitkumar at openjdk.org Mon Nov 4 03:38:39 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 4 Nov 2024 03:38:39 GMT Subject: Integrated: 8341068: [s390x] intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 09:45:19 GMT, Amit Kumar wrote: > Add match rules for UDivI, UModI, UDivL, UModL. And also adds `dlr` and `dlgr` instruction. > > Tier1 test are clean for fastdebug vm; > > Before this patch, `compiler/c2/TestDivModNodes.java` was failing (see jbs issue) but with this patch test is passing. > > Without Patch: > > > Benchmark (BUFFER_SIZE) (divisorType) Mode Cnt Score Error Units > IntegerDivMod.testDivideRemainderUnsigned 1024 mixed avgt 15 1935.176 ? 2.191 ns/op > IntegerDivMod.testDivideRemainderUnsigned 1024 positive avgt 15 1934.915 ? 3.207 ns/op > IntegerDivMod.testDivideRemainderUnsigned 1024 negative avgt 15 1934.325 ? 1.108 ns/op > IntegerDivMod.testDivideUnsigned 1024 mixed avgt 15 1809.782 ? 49.341 ns/op > IntegerDivMod.testDivideUnsigned 1024 positive avgt 15 1769.326 ? 2.607 ns/op > IntegerDivMod.testDivideUnsigned 1024 negative avgt 15 1784.053 ? 71.190 ns/op > IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 15 2026.978 ? 1.534 ns/op > IntegerDivMod.testRemainderUnsigned 1024 positive avgt 15 2028.039 ? 3.812 ns/op > IntegerDivMod.testRemainderUnsigned 1024 negative avgt 15 2437.843 ? 636.808 ns/op > Finished running test 'micro:java.lang.IntegerDivMod' > > > Benchmark (BUFFER_SIZE) (divisorType) Mode Cnt Score Error Units > LongDivMod.testDivideRemainderUnsigned 1024 mixed avgt 15 4524.897 ? 16.566 ns/op > LongDivMod.testDivideRemainderUnsigned 1024 positive avgt 15 4373.714 ? 9.514 ns/op > LongDivMod.testDivideRemainderUnsigned 1024 negative avgt 15 2018.309 ? 1.788 ns/op > LongDivMod.testDivideUnsigned 1024 mixed avgt 15 4320.382 ? 19.055 ns/op > LongDivMod.testDivideUnsigned 1024 positive avgt 15 3988.953 ? 8.770 ns/op > LongDivMod.testDivideUnsigned 1024 negative avgt 15 1069.703 ? 1.525 ns/op > LongDivMod.testRemainderUnsigned 1024 mixed avgt 15 5589.319 ? 4.247 ns/op > LongDivMod.testRemainderUnsigned 1024 positive avgt 15 3904.555 ? 3.191 ns/op > LongDivMod.testRemainderUnsigned 1024 negative avgt 15 1765.761 ? 1.539 ns/op > Finished ... This pull request has now been integrated. Changeset: c1251780 Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/c125178065664fdf96c42dfc6dcfa2431e6011a4 Stats: 101 lines in 3 files changed: 99 ins; 0 del; 2 mod 8341068: [s390x] intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long Reviewed-by: lucy, aph ------------- PR: https://git.openjdk.org/jdk/pull/21559 From jkarthikeyan at openjdk.org Mon Nov 4 04:25:07 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 4 Nov 2024 04:25:07 GMT Subject: RFR: 8341781: Improve Min/Max node identities [v4] In-Reply-To: References: Message-ID: > Hi all, > This patch implements some missing identities for Min/Max nodes. It adds static type-based operand choosing for MinI/MaxI, such as the ones that MinL/MaxL use. In addition, it adds simplification for patterns such as `Max(A, Max(A, B))` to `Max(A, B)` and `Max(A, Min(A, B))` to `A`. These simplifications stem from the [lattice identity rules](https://en.wikipedia.org/wiki/Lattice_(order)#As_algebraic_structure). The main place I've seen this pattern is with MinL/MaxL nodes created during loop optimizations. Some examples of where this occurs include BigInteger addition/subtraction, and regex code. I've run some of the existing benchmarks and found some nice improvements: > > Baseline Patch > Benchmark Mode Cnt Score Error Units Score Error Units Improvement > BigIntegers.testAdd avgt 15 25.096 ? 3.936 ns/op 19.214 ? 0.521 ns/op (+ 26.5%) > PatternBench.charPatternCompile avgt 8 453.727 ? 117.265 ns/op 370.054 ? 26.106 ns/op (+ 20.3%) > PatternBench.charPatternMatch avgt 8 917.604 ? 121.766 ns/op 810.560 ? 38.437 ns/op (+ 12.3%) > PatternBench.charPatternMatchWithCompile avgt 8 1477.703 ? 255.783 ns/op 1224.460 ? 28.220 ns/op (+ 18.7%) > PatternBench.longStringGraphemeMatches avgt 8 860.909 ? 124.661 ns/op 743.729 ? 22.877 ns/op (+ 14.6%) > PatternBench.splitFlags avgt 8 420.506 ? 76.252 ns/op 321.911 ? 11.661 ns/op (+ 26.6%) > > I've added some IR tests, and tier 1 testing passes on my linux machine. Reviews would be appreciated! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Make long tests check IR ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21439/files - new: https://git.openjdk.org/jdk/pull/21439/files/39f7d047..fc484f6e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21439&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21439&range=02-03 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/21439.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21439/head:pull/21439 PR: https://git.openjdk.org/jdk/pull/21439 From jkarthikeyan at openjdk.org Mon Nov 4 04:25:10 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 4 Nov 2024 04:25:10 GMT Subject: RFR: 8341781: Improve Min/Max node identities [v3] In-Reply-To: References: <3A-W4pcQj_I0QNWlUU3qibf6SQbNnZyO1JxeH1ym9Lw=.d343a0a6-10f4-4a3a-89fc-06e4cef04d02@github.com> Message-ID: <9UXnatwgwzVK3JhV2nBG4qaIFg1aBJTP9Ti9vFbKHuY=.aa113a75-30df-4629-8f3f-47e5266e882f@github.com> On Fri, 1 Nov 2024 07:08:31 GMT, Emanuel Peter wrote: >> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Add platform checks to IR >> - Merge branch 'master' into minmax_identities >> - Suggestions from review >> - Min/Max identities > > test/hotspot/jtreg/compiler/c2/irTests/TestMinMaxIdentities.java line 120: > >> 118: >> 119: @Test >> 120: // @IR(applyIfPlatform = { "riscv64", "false" }, phase = { CompilePhase.BEFORE_MACRO_EXPANSION }, counts = { IRNode.MIN_L, "1" }) > > I would say you should make them negative for now, i.e. make them `failOn`. Otherwise we won't catch these cases when JDK-8307513 gets integrated ;) Sounds good, I've pushed a commit that makes the tests pass now but fail when 8307513 is integrated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1827178087 From thartmann at openjdk.org Mon Nov 4 06:30:32 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 4 Nov 2024 06:30:32 GMT Subject: RFR: 8343206: Final graph reshaping should not compress abstract or interface class pointers [v3] In-Reply-To: References: <4_84pZqk5-pV1iTUdpf5wmVczTdHq-9-Re1qjbGU7Eo=.0fb46e18-883f-45f8-827d-567602373431@github.com> Message-ID: On Fri, 1 Nov 2024 08:23:42 GMT, Dean Long wrote: >> Right, I was hoping for that too and tried to move the assert into `TypeNarrowKlass::make`. We do have all the information there but we hit false positives in rare cases like this when `MyAbstract` does not have any subtypes at compile time (mostly with `-Xcomp`): >> >> MyAbstract obj = ...; >> obj.getClass(); >> >> C2 will add a dependency that will invalidate the code once a subclass is loaded and then optimizes the narrow class load from `obj` to be of constant narrow class type `MyAbstract`. The assert will trigger but we will never emit a compressed class pointer because the narrow class load + decode is folded to a non-narrow constant. >> >> We could move the assert to a later stage though. I'll give that a try. > > Do we actually generate an nmethod for the above example? It seems like it could never execute the getClass() because the line above setting `obj` would have to throw an exception if there can be no concrete instances. Right, this was an oversimplified example. I used this code: Class test(MyAbstract obj, boolean b) { if (b) { return obj.getClass(); } return null; } We pass `null` for `obj` and `false` for `b`. Usually, the branch is then only compiled with Xcomp. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21784#discussion_r1827238258 From thartmann at openjdk.org Mon Nov 4 06:30:33 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 4 Nov 2024 06:30:33 GMT Subject: Integrated: 8343206: Final graph reshaping should not compress abstract or interface class pointers In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 11:38:53 GMT, Tobias Hartmann wrote: > @fisk found this problematic optimization in final graph reshaping where we would convert a `CmpP` into a `CmpN` by converting a constant class pointer operand to a narrow class pointer. After [JDK-8338526](https://bugs.openjdk.org/browse/JDK-8338526), this is not valid if the class pointer refers to an interface or abstract class. > > I think it's not an issue in current code though. The only way we can get a dynamic narrow class is when loading from an object at `oopDesc::klass_offset_in_bytes()`: > https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/type.cpp#L3521-L3522 > > Comparisons of such loads with a constant class pointer of interface or abstract class type are always folded during GVN: > https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/subnode.cpp#L1164-L1171 > > And therefore, the code in `Compile::final_graph_reshaping_main_switch` will never trigger. > > I added a corresponding assert and bailout in product to be on the safe side. The assert never triggered in my testing. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 2432c4f8 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/2432c4f862e66e91c60e75ccc43b376020d80a1f Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod 8343206: Final graph reshaping should not compress abstract or interface class pointers Reviewed-by: coleenp, eosterlund, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21784 From thartmann at openjdk.org Mon Nov 4 06:37:30 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 4 Nov 2024 06:37:30 GMT Subject: RFR: 8340454: C2 EA asserts with "previous reducible Phi is no longer reducible before SUT" [v3] In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 21:53:45 GMT, Cesar Soares Lucas wrote: >> Please, consider this patch to fix an issue that happens when a Phi previously considered reducible become later irreducible. The overall situation that causes the problem is like so: >> >> - Consider that there are at least 2 scalar replaceable objects (Obj1 and Obj2; Obj2 is stored in a field of Obj1) when we start iterating the loop at escape.cpp:301 >> >> - In the first iteration of the loop the call chain starting with `adjust_scalar_replaceable_state` ends up calling `can_reduce_phi` and considering Phi1 as reducible. This Phi has only Obj1 as *SR* input. >> >> - In another iteration of the loop Obj2 is flagged as NSR. For instance, because we are storing Obj2 in an unknown position of an array. This will cause `found_nsr_alloc` to be set to `true`. >> >> After the loop finishes, the execution will go to `find_scalar_replaceable_allocs`. The code will process Obj1, because it's still scalar replaceable, but will find that this object is stored in a field of a - **now** - NSR object. Therefore, correctly, Obj1 will also be marked as NSR. When Obj1 is marked as NSR Phi1 becomes irreducible because it doesn't have any more scalar replaceable input. >> >> The solution I'm proposing is simply revisit the "reducibility" of the Phis when an object is marked as NSR. >> >> --------- >> >> ### Tests >> >> Win, Mac & Linux tier1-4 on x64 & Aarch64. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Address PR feedback: include test execution options. Marked as reviewed by thartmann (Reviewer). Thanks Cesar, that looks good to me. I'll run a final round of testing and report back once it passed. ------------- PR Review: https://git.openjdk.org/jdk/pull/21778#pullrequestreview-2412246708 PR Comment: https://git.openjdk.org/jdk/pull/21778#issuecomment-2453917034 From amitkumar at openjdk.org Mon Nov 4 07:04:57 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 4 Nov 2024 07:04:57 GMT Subject: RFR: 8343506: [s390x] multiple test failures with ubsan Message-ID: This is trivial patch which fixes the error I am seeing on s390x, while running tier1 with ubsan enabled. Please see JBS for more details. ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/21864/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21864&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343506 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21864.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21864/head:pull/21864 PR: https://git.openjdk.org/jdk/pull/21864 From dfenacci at openjdk.org Mon Nov 4 07:36:35 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 4 Nov 2024 07:36:35 GMT Subject: RFR: 8343153: compiler/codecache/CheckLargePages.java fails on linux with huge pages configured but its number set to 0 In-Reply-To: <0tK_3KUqMNg0R5YdFFUlxsSeYZvF57UP_U0b6wdDhG8=.084bfe90-911f-4de2-aa5f-19ed208657b4@github.com> References: <0tK_3KUqMNg0R5YdFFUlxsSeYZvF57UP_U0b6wdDhG8=.084bfe90-911f-4de2-aa5f-19ed208657b4@github.com> Message-ID: On Thu, 31 Oct 2024 13:16:08 GMT, Evgeny Astigeevich wrote: >>> The problem might be attributed to the VM only checking for 1GB huge pages to be supported, not how many there currently are. >> >> https://bugs.openjdk.org/browse/JDK-8321526 > >> @eastig I noticed that you are the author of the original `testNonSegmented1GbCodeCacheWith1GbLargePages` test. Could I ask you to have a look at this change? Thanks a lot! > > `testDefaultCodeCacheWith1GbLargePages` and `testNonSegmented1GbCodeCacheWith1GbLargePages` should only be run if a system provides 1Gb pages. This is mentioned in their names: `...With1GbLargePages`. If there are no 1Gb pages available, the test should not be run. > > I suggest to check `/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages >= 1`. If not, output "Skipping testDefaultCodeCacheWith1GbLargePages and testDefaultCodeCacheWith1GbLargePages, no 1Gb pages available" . > > With your change, if a system provides 1Gb pages but JVM fails to use them because of a bug, the tests will pass and the bug will be unknown. Thank you for your reviews @eastig @TobiHartmann. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21757#issuecomment-2453985227 From dfenacci at openjdk.org Mon Nov 4 07:36:36 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 4 Nov 2024 07:36:36 GMT Subject: Integrated: 8343153: compiler/codecache/CheckLargePages.java fails on linux with huge pages configured but its number set to 0 In-Reply-To: References: Message-ID: <5Uqdx6inJiDqBconhAcsyD1QCVTGhjQEEI3BvDmqVew=.e4af7332-c171-40fb-9875-055f13cb00c9@github.com> On Tue, 29 Oct 2024 10:54:31 GMT, Damon Fenacci wrote: > # Issue > > The third test of `compiler/codecache/CheckLargePages.java` checks that non-segmented 1GB code cache can be allocated with 1GB large pages. > > On linux (the only supported platform) in order to allocate them, 1GB huge pages have to be enabled (checkable from `/proc/meminfo` and `/sys/kernel/mm/hugepages/hugepages-xxxx`) and their number has to be set to >0 (checkable from `/sys/kernel/mm/hugepages/hugepages-xxxx/nr_hugepages`). > > If 1GB huge pages are enabled but their number is 0, the test fails because it looks for a string that matches `CodeCache: min=1[gG] max=1[gG] base=[^ ]+ size=1[gG] page_size=1[gG]` but the actual output is `CodeCache: min=1G max=1G base=0x00007f4040000000 size=1G page_size=2M`. This happens because the VM tries to allocate 1GB huge pages but it fails beause the number of allocatable ones is 0 and the VM reverts to smaller large page sizes (2MB). > > # Solution > > The problem might be attributed to the VM only checking for 1GB huge pages to be supported, not how many there currently are. Nevertheless, this seems to be the correct behaviour, not least because their number can be changed dynamically. > So, the correct thing to do seems to be to "relax" the check made by the test to include both cases: > * when 1GB huge pages are supported and can be allocated correctly > * when 1GB huge pages are supported but cannot be allocated correctly (because there are none available) and the VM reverts to 2MB huge pages (if there are no 2MB pages available the test doesn't run at all). This pull request has now been integrated. Changeset: e7f0bf11 Author: Damon Fenacci URL: https://git.openjdk.org/jdk/commit/e7f0bf11ff0e89b6b156d5e88ca3771c706aa46a Stats: 24 lines in 2 files changed: 20 ins; 1 del; 3 mod 8343153: compiler/codecache/CheckLargePages.java fails on linux with huge pages configured but its number set to 0 Reviewed-by: eastigeevich, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/21757 From mli at openjdk.org Mon Nov 4 09:22:37 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 4 Nov 2024 09:22:37 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v4] In-Reply-To: <1CBxrIcc1nOhl-xlgLDw2qjDt4JFIlOC1kbWXJSTt5w=.cd18419f-40b6-44d6-bce0-5a06e494d9eb@github.com> References: <033UQ-960JvpQbpMGHPKyoIv43XWj2bGzdnmCSoZrac=.3ffd7d94-2eb4-44e3-b91b-967d58aa01f7@github.com> <1CBxrIcc1nOhl-xlgLDw2qjDt4JFIlOC1kbWXJSTt5w=.cd18419f-40b6-44d6-bce0-5a06e494d9eb@github.com> Message-ID: On Fri, 1 Nov 2024 11:01:24 GMT, Andrew Haley wrote: > Here are my results, Apple M1. Pretty similar to what we've seen, but no SVE. > > Looks good. Thank you so much for testing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21502#issuecomment-2454181484 From mli at openjdk.org Mon Nov 4 09:22:38 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 4 Nov 2024 09:22:38 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v4] In-Reply-To: <033UQ-960JvpQbpMGHPKyoIv43XWj2bGzdnmCSoZrac=.3ffd7d94-2eb4-44e3-b91b-967d58aa01f7@github.com> References: <033UQ-960JvpQbpMGHPKyoIv43XWj2bGzdnmCSoZrac=.3ffd7d94-2eb4-44e3-b91b-967d58aa01f7@github.com> Message-ID: On Tue, 22 Oct 2024 09:28:36 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? Previously it's https://github.com/openjdk/jdk/pull/18605. >> This pr is based on https://github.com/openjdk/jdk/pull/20781. >> >> Thanks! >> >> ## Test >> ### tests: >> * test/jdk/jdk/incubator/vector/ >> * test/hotspot/jtreg/compiler/vectorapi/ >> >> ### options: >> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:+EnableVectorSupport -XX:-UseVectorStubs >> >> ## Performance >> >> ### Tests >> jmh tests are test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ from vectorIntrinsics branch in panama-vector. It's good to have these tests in jdk main stream, I will do it in a separate pr later. (These tests are auto-generated tests from some script&template, it's good to also have those scrip&template in jdk main stream, but those scrip&template generates more other tests than what we need here, so better to add these tests and script&template in another pr). >> >> ### Options >> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' >> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' >> >> ### Performance data >> I have re-tested, there is no much difference from https://github.com/openjdk/jdk/pull/18605, so please check performance data in that pr. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > add comment for tanh Thanks all for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21502#issuecomment-2454182427 From mli at openjdk.org Mon Nov 4 09:22:40 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 4 Nov 2024 09:22:40 GMT Subject: Integrated: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 14:57:46 GMT, Hamlin Li wrote: > Hi, > Can you help to review the patch? Previously it's https://github.com/openjdk/jdk/pull/18605. > This pr is based on https://github.com/openjdk/jdk/pull/20781. > > Thanks! > > ## Test > ### tests: > * test/jdk/jdk/incubator/vector/ > * test/hotspot/jtreg/compiler/vectorapi/ > > ### options: > * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs > * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs > * -XX:+EnableVectorSupport -XX:-UseVectorStubs > > ## Performance > > ### Tests > jmh tests are test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ from vectorIntrinsics branch in panama-vector. It's good to have these tests in jdk main stream, I will do it in a separate pr later. (These tests are auto-generated tests from some script&template, it's good to also have those scrip&template in jdk main stream, but those scrip&template generates more other tests than what we need here, so better to add these tests and script&template in another pr). > > ### Options > * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' > * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' > > ### Performance data > I have re-tested, there is no much difference from https://github.com/openjdk/jdk/pull/18605, so please check performance data in that pr. This pull request has now been integrated. Changeset: df08a9ec Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/df08a9ec0d813fcd4ea88a3773c230af6d65e045 Stats: 343 lines in 8 files changed: 338 ins; 1 del; 4 mod 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF Co-authored-by: Xiaohong Gong Reviewed-by: ihse, fgao, aph ------------- PR: https://git.openjdk.org/jdk/pull/21502 From rcastanedalo at openjdk.org Mon Nov 4 09:44:52 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 4 Nov 2024 09:44:52 GMT Subject: RFR: 8339303: C2: dead node after failing to match cloned address expression Message-ID: This changeset prevents the x86 platform-specific logic from cloning address expressions consisting of two chained `AddP` nodes with a small constant offset each, such as in the following example: ![example](https://github.com/user-attachments/assets/86c143a1-3895-4e0c-936b-0d22b7c80e73) Such patterns cannot be fully subsumed into x86 complex addressing modes, and cloning them can cause the matcher to introduce dead nodes that trigger a segmentation fault in the subsequent global code motion phase. See a detailed analysis of the failure in the [JBS issue description](https://bugs.openjdk.org/browse/JDK-8339303). The changeset additionally extends the post-matching verification logic to check that no old node is reachable by travesing both node inputs and outputs. This extension would have caused the original test case to fail directly after matching with an informative assertion message rather than an opaque segmentation fault in an unrelated code generation phase. Note that the pattern causing the failure should be in general optimized by `AddPNode::Ideal` into a single `AddP` node with the constant sum of the offsets. While [JDK-8343067](https://bugs.openjdk.org/browse/JDK-8343067) should address the missing optimization, this changeset proposes a complementary solution that is easily backportable and avoids relying on specific optimizations for correctness. #### Testing ##### Functionality - tier1-5 (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64; release and debug mode). ##### Performance - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. ------------- Commit messages: - Remove fix condition - Add regression test - Verify that there are no out references to dead nodes after matching - Do not clone two pointer adds using each an immediate on x86 Changes: https://git.openjdk.org/jdk/pull/21829/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21829&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339303 Stats: 73 lines in 3 files changed: 67 ins; 3 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21829.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21829/head:pull/21829 PR: https://git.openjdk.org/jdk/pull/21829 From mdoerr at openjdk.org Mon Nov 4 10:01:37 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 4 Nov 2024 10:01:37 GMT Subject: RFR: 8343205: CompileBroker::possibly_add_compiler_threads excessively polls available memory [v2] In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 00:22:42 GMT, Martin Doerr wrote: >> This PR adds a quick check + bail out in order to avoid excessive usage of slow checks. Especially, it avoids querying the available memory so often. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Minor improvements (review feedback). Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21812#issuecomment-2454262340 From mdoerr at openjdk.org Mon Nov 4 10:01:38 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 4 Nov 2024 10:01:38 GMT Subject: Integrated: 8343205: CompileBroker::possibly_add_compiler_threads excessively polls available memory In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 17:03:33 GMT, Martin Doerr wrote: > This PR adds a quick check + bail out in order to avoid excessive usage of slow checks. Especially, it avoids querying the available memory so often. This pull request has now been integrated. Changeset: 75801992 Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/75801992a7c626d409f66e2491082dba84c6fe45 Stats: 27 lines in 2 files changed: 16 ins; 0 del; 11 mod 8343205: CompileBroker::possibly_add_compiler_threads excessively polls available memory Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21812 From chagedorn at openjdk.org Mon Nov 4 10:17:45 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 4 Nov 2024 10:17:45 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v16] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Fri, 1 Nov 2024 14:49:07 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > more review applications Some final last mostly minor comments but otherwise, it looks good to me now! I like the summaries and how you worked out the proofs. They are now easy to understand. src/hotspot/share/opto/memnode.cpp line 2943: > 2941: return false; > 2942: } > 2943: return true; You could directly return (I cannot create a code suggestion as it says "Applying suggestions on deleted lines is not supported"): return pointer_def.is_adjacent_to_and_before(pointer_use); src/hotspot/share/opto/mempointer.cpp line 45: > 43: while (_worklist.is_nonempty()) { > 44: // Bail out if the graph is too complex. > 45: if (traversal_count++ > 1000) { return MemPointerDecomposedForm(pointer); } Might be easier to read/understand when we also have `MemPointerDecomposedForm::make_trivial(pointer)` method. What do you think? Then `MemPointerDecomposedForm(pointer)` can also be made private. src/hotspot/share/opto/mempointer.cpp line 199: > 197: #else > 198: > 199: switch(opc) { Suggestion: switch (opc) { src/hotspot/share/opto/mempointer.cpp line 265: > 263: // > 264: // Thus, for AddI and SubI, we get: > 265: // summand = new_summand1 + new_summand2 + scale * y * 2^32 Took me a moment to understand `new_summands`. Maybe we can give a hint like that? Suggestion: // scale * ConvI2L(a << con) = scale * (1 << con) * ConvI2L(a) + scale * y * 2^32 // _______________________/ _____________________________________/ ______________/ // before decomposition after decomposition ("new_summands") overflow correction // // Thus, for AddI and SubI, we get: // summand = new_summand1 + new_summand2 + scale * y * 2^32 src/hotspot/share/opto/mempointer.cpp line 283: > 281: // z * array_element_size_in_bytes = scale > 282: // > 283: // And hence, with "x = y * z": Maybe add here: Suggestion: // And hence, with "x = y * z", the decomposition is (SAFE2) under assumed condition: src/hotspot/share/opto/mempointer.cpp line 318: > 316: #endif > 317: > 318: // "MemPointer Lemma" condition S2: check if all summands are the same: Suggestion: // "MemPointer Lemma" condition (S2): check if all summands are the same: src/hotspot/share/opto/mempointer.cpp line 332: > 330: } > 331: > 332: // "MemPointer Lemma" condition S3: check that the constants do not differ too much: Suggestion: // "MemPointer Lemma" condition (S3): check that the constants do not differ too much: src/hotspot/share/opto/mempointer.cpp line 347: > 345: } > 346: > 347: // "MemPointer Lemma" condition S1: Suggestion: // "MemPointer Lemma" condition (S1): src/hotspot/share/opto/mempointer.cpp line 352: > 350: // bounds of that same memory object. > 351: > 352: // Hence, all 3 conditions of the "MemoryPointer Lemma" are established, and hence Since we also have added `(S0)` recently, we might need to add a word here about it and then update this to "all 4 conditions". src/hotspot/share/opto/mempointer.cpp line 382: > 380: return is_adjacent; > 381: } > 382: Two new lines: Suggestion: src/hotspot/share/opto/mempointer.hpp line 46: > 44: // compile-time variables (C2 nodes). > 45: // > 46: // For the MemPointer, we do not explicitly track base address. For Java heap pointers, the Suggestion: // For the MemPointer, we do not explicitly track the base address. For Java heap pointers, the src/hotspot/share/opto/mempointer.hpp line 232: > 230: // We decompose summand in: > 231: // mp_i = con + summand + SUM(other_summands) > 232: // Resulting in: +-------------------------+ Suggestion: // resulting in: +-------------------------+ src/hotspot/share/opto/mempointer.hpp line 258: > 256: // (S3) All summands of mp1 and mp2 are identical (i.e. only the constants are possibly different). > 257: // > 258: // Then the pointer difference between p1 and p2 is identical to the difference between Suggestion: // then the pointer difference between p1 and p2 is identical to the difference between src/hotspot/share/opto/mempointer.hpp line 332: > 330: // -- apply x != 0 -- > 331: // >= array_element_size_in_bytes * 2^32 - abs(mp1 - mp2) > 332: // -- apply (S3) -- Suggestion: // -- apply (S3) -- src/hotspot/share/opto/mempointer.hpp line 334: > 332: // -- apply (S3) -- > 333: // = array_element_size_in_bytes * 2^32 - abs(mp1.con - mp2.con) > 334: // -- apply (S2) -- Suggestion: // -- apply (S2) -- test/hotspot/jtreg/compiler/c2/TestMergeStoresMemorySegment.java line 114: > 112: */ > 113: > 114: // FAILS: mixed providers currently do not merge stores. Maybe there is some inlining issue. Is there a tracking bug/RFE to make this work? test/hotspot/jtreg/compiler/c2/TestMergeStoresMemorySegment.java line 196: > 194: Map tests = new HashMap<>(); > 195: > 196: // List of gold, the results from the first run before compilation Sounds funny :-) Maybe: Suggestion: // List of golden values, the results from the first run before compilation test/hotspot/jtreg/compiler/c2/TestMergeStoresMemorySegment.java line 375: > 373: applyIf = {"UseUnalignedAccesses", "true"}) > 374: static Object[] test_xxx(MemorySegment a, int xI, int yI, int zI) { > 375: // All RangeChecks remain -> RC smearing not good enough? Is there a tracking bug to further investigate at some point? ------------- PR Review: https://git.openjdk.org/jdk/pull/19970#pullrequestreview-2412305011 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827276306 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827447237 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827412672 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827463435 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827469254 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827482695 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827483621 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827484278 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827487031 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827488991 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827338243 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827381411 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827382642 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827390691 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827390901 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827300978 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827302506 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827310975 From epeter at openjdk.org Mon Nov 4 10:20:39 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 4 Nov 2024 10:20:39 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v25] In-Reply-To: References: Message-ID: On Sun, 20 Oct 2024 16:41:19 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: > > - Merge branch 'master' into unsignedbounds > - address reviews > - comment adjust_lo empty case > - formality > - address reviews > - add comments, refactor functions to helper class > - refine comments > - remove leftover code > - add doc to TypeInt, rename parameters, remove unused methods > - change (~v & ones) == 0 to (v & ones) == ones > - ... and 22 more: https://git.openjdk.org/jdk/compare/309b9291...7f3316fa Sorry, I've been very slow on this. A few more comments before lunch. src/hotspot/share/opto/rangeinference.hpp line 159: > 157: > 158: template > 159: static bool int_type_subset(const CT* super, const CT* sub) { Suggestion: static bool is_int_type_equal(const CT* t1, const CT* t2) { return t1->_lo == t2->_lo && t1->_hi == t2->_hi && t1->_ulo == t2->_ulo && t1->_uhi == t2->_uhi && t1->_bits._zeros == t2->_bits._zeros && t1->_bits._ones == t2->_bits._ones; } template static bool is_int_type_subset(const CT* super, const CT* sub) { I think these should be `is_...` names. src/hotspot/share/opto/type.hpp line 616: > 614: * > 615: * 1. Since every TypeInt instance is canonicalized, all the bounds must also > 616: * be elements of such TypeInt. Or else, we can tighted the bounds by narrowing Suggestion: * be elements of such TypeInt. Or else, we can tighten the bounds by narrowing src/hotspot/share/opto/type.hpp line 620: > 618: * > 619: * 2. Either _lo == jint(_ulo) and _hi == jint(_uhi), or all elements of a > 620: * TypeInt lie in the intervals [_lo, jint(_uhi)] or [jint(_ulo), _hi] The `[_lo, jint(_uhi)] or [jint(_ulo), _hi]` in english is not precise enough. - Is it a mathematical `OR`: the element can also be in both? In that case I would add "or both". - Is it a mathematical `XOR`? Then I would write "either ... or .. but not both" src/hotspot/share/opto/type.hpp line 622: > 620: * TypeInt lie in the intervals [_lo, jint(_uhi)] or [jint(_ulo), _hi] > 621: * > 622: * Proof: For 2 jint value x, y such that they are both >= 0 or < 0. Then: Suggestion: * Proof: For 2 jint value x, y such that they are both >= 0 or both < 0. Then: Or are you allowing them to one be positive and one negative? src/hotspot/share/opto/type.hpp line 645: > 643: * can be seen that _lo and jint(_uhi) are both < 0 or >= 0, and the same > 644: * applies to jint(_ulo) and _hi. > 645: */ I would appreciate some indentation: it would make it easier to see points 1, 2, ... And to see what is part of the proof, and what is part of a case distinction and each case in it. ------------- PR Review: https://git.openjdk.org/jdk/pull/17508#pullrequestreview-2412481589 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1827379995 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1827388140 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1827391449 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1827393271 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1827397025 From rehn at openjdk.org Mon Nov 4 10:27:31 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 4 Nov 2024 10:27:31 GMT Subject: RFR: 8343415: RISC-V: Increase maximum size of C2EntryBarrierStub by four [v2] In-Reply-To: References: Message-ID: <29u1mOEK-Bw7KZZscik5rrpmZAjreO6JK4IQ2JA0mUg=.47443722-b683-4112-90ec-8473915f560d@github.com> On Fri, 1 Nov 2024 02:43:06 GMT, Fei Yang wrote: >> Hi, please consider this small change. >> >> There is one jump to continuation (after nmethod entry barriers) in C2EntryBarrierStub [1]. >> The current max_size setting assumes the distance is within 1MB, which means a simple `jal` instruction [2]. >> So I just count one for this jump in [JDK-8343121](https://bugs.openjdk.org/browse/JDK-8343121). This doesn't seem to break for various tests. But I don't think there is a good reason for that assumption to stand. Instead, we should remove this constraint assuming a `auipc+jalr` pair for this jump. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_CodeStubs_riscv.cpp#L66 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L965 >> >> Testing on linux-riscv64: >> - [x] tier1 (fastdebug build) > > Fei Yang has updated the pull request incrementally with one additional commit since the last revision: > > Comment typo Thanks! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21818#pullrequestreview-2412682449 From epeter at openjdk.org Mon Nov 4 10:35:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 4 Nov 2024 10:35:15 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v17] In-Reply-To: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: > **Background** > I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. > > **Details** > > The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. > > This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. > > More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! > > `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. > > **What this change enables** > > Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). > > Now we can do: > - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. > - Merging `Unsafe` stores to native memory. > - Merging `MemorySegment`: with array, native, ByteBuffer backing types. > - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. > > **Dealing with Overflows** > > We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. > > **Bench... Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - Update test/hotspot/jtreg/compiler/c2/TestMergeStoresMemorySegment.java Co-authored-by: Christian Hagedorn - Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19970/files - new: https://git.openjdk.org/jdk/pull/19970/files/d10b76ff..823bed75 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=15-16 Stats: 13 lines in 3 files changed: 0 ins; 1 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/19970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19970/head:pull/19970 PR: https://git.openjdk.org/jdk/pull/19970 From epeter at openjdk.org Mon Nov 4 10:35:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 4 Nov 2024 10:35:15 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v16] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Fri, 1 Nov 2024 14:49:07 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > more review applications You spent enough time on this already ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2454341438 From epeter at openjdk.org Mon Nov 4 10:35:16 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 4 Nov 2024 10:35:16 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v16] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Mon, 4 Nov 2024 07:20:48 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more review applications > > src/hotspot/share/opto/memnode.cpp line 2943: > >> 2941: return false; >> 2942: } >> 2943: return true; > > You could directly return (I cannot create a code suggestion as it says "Applying suggestions on deleted lines is not supported"): > > return pointer_def.is_adjacent_to_and_before(pointer_use); Ah good idea ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827512031 From fyang at openjdk.org Mon Nov 4 10:56:34 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 4 Nov 2024 10:56:34 GMT Subject: RFR: 8343415: RISC-V: Increase maximum size of C2EntryBarrierStub by four [v2] In-Reply-To: References: Message-ID: <-y_-jeLxQY8hfcK95gjgIWKgra1f0vbgJ1QE2mz5UDs=.092f442d-e324-4e11-9f30-a296f3ede949@github.com> On Fri, 1 Nov 2024 02:43:06 GMT, Fei Yang wrote: >> Hi, please consider this small change. >> >> There is one jump to continuation (after nmethod entry barriers) in C2EntryBarrierStub [1]. >> The current max_size setting assumes the distance is within 1MB, which means a simple `jal` instruction [2]. >> So I just count one for this jump in [JDK-8343121](https://bugs.openjdk.org/browse/JDK-8343121). This doesn't seem to break for various tests. But I don't think there is a good reason for that assumption to stand. Instead, we should remove this constraint assuming a `auipc+jr` pair for this jump. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_CodeStubs_riscv.cpp#L66 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L965 >> >> Testing on linux-riscv64: >> - [x] tier1 (fastdebug build) > > Fei Yang has updated the pull request incrementally with one additional commit since the last revision: > > Comment typo Thanks all for the review! Moving on ... ------------- PR Comment: https://git.openjdk.org/jdk/pull/21818#issuecomment-2454396776 From fyang at openjdk.org Mon Nov 4 10:56:34 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 4 Nov 2024 10:56:34 GMT Subject: Integrated: 8343415: RISC-V: Increase maximum size of C2EntryBarrierStub by four In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 02:13:16 GMT, Fei Yang wrote: > Hi, please consider this small change. > > There is one jump to continuation (after nmethod entry barriers) in C2EntryBarrierStub [1]. > The current max_size setting assumes the distance is within 1MB, which means a simple `jal` instruction [2]. > So I just count one for this jump in [JDK-8343121](https://bugs.openjdk.org/browse/JDK-8343121). This doesn't seem to break for various tests. But I don't think there is a good reason for that assumption to stand. Instead, we should remove this constraint assuming a `auipc+jr` pair for this jump. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_CodeStubs_riscv.cpp#L66 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L965 > > Testing on linux-riscv64: > - [x] tier1 (fastdebug build) This pull request has now been integrated. Changeset: 7f131a9e Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/7f131a9e1eb96d905a57f6e1e6fec2b7c7f725a4 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8343415: RISC-V: Increase maximum size of C2EntryBarrierStub by four Reviewed-by: rehn, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/21818 From chagedorn at openjdk.org Mon Nov 4 11:31:37 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 4 Nov 2024 11:31:37 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v17] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Mon, 4 Nov 2024 10:35:15 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: > > - Update test/hotspot/jtreg/compiler/c2/TestMergeStoresMemorySegment.java > > Co-authored-by: Christian Hagedorn > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > /contributor add chhagedorn > > You spent enough time on this already ;) Thanks Emanuel, I highly appreciate that :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2454464312 From epeter at openjdk.org Mon Nov 4 11:31:39 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 4 Nov 2024 11:31:39 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v16] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Mon, 4 Nov 2024 09:41:40 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more review applications > > src/hotspot/share/opto/mempointer.cpp line 45: > >> 43: while (_worklist.is_nonempty()) { >> 44: // Bail out if the graph is too complex. >> 45: if (traversal_count++ > 1000) { return MemPointerDecomposedForm(pointer); } > > Might be easier to read/understand when we also have `MemPointerDecomposedForm::make_trivial(pointer)` method. What do you think? Then `MemPointerDecomposedForm(pointer)` can also be made private. Good idea! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827584414 From epeter at openjdk.org Mon Nov 4 11:35:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 4 Nov 2024 11:35:36 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v16] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Mon, 4 Nov 2024 10:12:25 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more review applications > > src/hotspot/share/opto/mempointer.cpp line 352: > >> 350: // bounds of that same memory object. >> 351: >> 352: // Hence, all 3 conditions of the "MemoryPointer Lemma" are established, and hence > > Since we also have added `(S0)` recently, we might need to add a word here about it and then update this to "all 4 conditions". Good idea ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827588447 From epeter at openjdk.org Mon Nov 4 11:48:50 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 4 Nov 2024 11:48:50 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v17] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: <_bo_FK7zqp8oBdlZdDWdKHvU-rwhCbeqK9ga7qs9Fas=.6ce66b49-42a6-4d8d-9b56-e616899afc48@github.com> On Mon, 4 Nov 2024 11:27:17 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update test/hotspot/jtreg/compiler/c2/TestMergeStoresMemorySegment.java >> >> Co-authored-by: Christian Hagedorn >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn > >> /contributor add chhagedorn >> >> You spent enough time on this already ;) > > Thanks Emanuel, I highly appreciate that :-) @chhagedorn I filed https://bugs.openjdk.org/browse/JDK-8343536 to track the cases in `TestMergeStoresMemorySegment.java` that do not optimize as hoped for. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2454498317 From epeter at openjdk.org Mon Nov 4 11:48:49 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 4 Nov 2024 11:48:49 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v18] In-Reply-To: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: > **Background** > I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. > > **Details** > > The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. > > This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. > > More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! > > `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. > > **What this change enables** > > Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). > > Now we can do: > - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. > - Merging `Unsafe` stores to native memory. > - Merging `MemorySegment`: with array, native, ByteBuffer backing types. > - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. > > **Dealing with Overflows** > > We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. > > **Bench... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: more changes for Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19970/files - new: https://git.openjdk.org/jdk/pull/19970/files/823bed75..c1f274f2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=16-17 Stats: 19 lines in 3 files changed: 8 ins; 4 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/19970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19970/head:pull/19970 PR: https://git.openjdk.org/jdk/pull/19970 From epeter at openjdk.org Mon Nov 4 11:48:50 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 4 Nov 2024 11:48:50 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v16] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Mon, 4 Nov 2024 07:41:49 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more review applications > > test/hotspot/jtreg/compiler/c2/TestMergeStoresMemorySegment.java line 114: > >> 112: */ >> 113: >> 114: // FAILS: mixed providers currently do not merge stores. Maybe there is some inlining issue. > > Is there a tracking bug/RFE to make this work? https://bugs.openjdk.org/browse/JDK-8343536 > test/hotspot/jtreg/compiler/c2/TestMergeStoresMemorySegment.java line 375: > >> 373: applyIf = {"UseUnalignedAccesses", "true"}) >> 374: static Object[] test_xxx(MemorySegment a, int xI, int yI, int zI) { >> 375: // All RangeChecks remain -> RC smearing not good enough? > > Is there a tracking bug to further investigate at some point? https://bugs.openjdk.org/browse/JDK-8343536 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827601925 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827601955 From epeter at openjdk.org Mon Nov 4 11:51:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 4 Nov 2024 11:51:36 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v17] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: <-qjDdk1O1LApPy16cdRihBCrNUFM-K0URHazl1pZuac=.f8bc6861-31c8-4cc2-96cb-3896222030df@github.com> On Mon, 4 Nov 2024 11:27:17 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update test/hotspot/jtreg/compiler/c2/TestMergeStoresMemorySegment.java >> >> Co-authored-by: Christian Hagedorn >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn > >> /contributor add chhagedorn >> >> You spent enough time on this already ;) > > Thanks Emanuel, I highly appreciate that :-) @chhagedorn I addressed all your review suggestions. Thank you very much for the in-depth review :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2454505103 From chagedorn at openjdk.org Mon Nov 4 12:12:35 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 4 Nov 2024 12:12:35 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v18] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: <53qOJR2DhdN02s1Y64fiuPD7ckg_Hr9mhpQWCCENQvk=.241e763a-268d-49f3-ba4a-d568ac95827b@github.com> On Mon, 4 Nov 2024 11:48:49 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > more changes for Christian That looks good, thanks for the patience to work through all the suggestions and also for the offline discussions! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19970#pullrequestreview-2412899131 From duke at openjdk.org Mon Nov 4 12:23:00 2024 From: duke at openjdk.org (Sorna Sarathi) Date: Mon, 4 Nov 2024 12:23:00 GMT Subject: RFR: JDK-8251926: [PPC] Removed an unused variable in assembler_ppc.cpp Message-ID: This PR removes an unused variable from load_const_optimized function in assembler_ppc.cpp file. JBS Issue: [JDK-8251926](https://bugs.openjdk.org/browse/JDK-8251926) ------------- Commit messages: - Removed an unused variable in assembler_ppc.cpp file Changes: https://git.openjdk.org/jdk/pull/21874/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21874&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8251926 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21874.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21874/head:pull/21874 PR: https://git.openjdk.org/jdk/pull/21874 From epeter at openjdk.org Mon Nov 4 12:23:38 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 4 Nov 2024 12:23:38 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Thu, 17 Oct 2024 21:42:33 GMT, Vladimir Kozlov wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > For me it is confusing to call `pointer = con + sum_i(scale_i * variable_i)` as "pointer" unless it is Unsafe address which has base address as constant. It misses base address. All out pointer types are correspond to an address of some object in Java heap, out of heap, VM's object or some native (C heap) VM object. > This looks like `address_offset`, `displacement`, ... @vnkozlov Would you like to re-review? If I don't hear anything then I'll integrate tomorrow. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2454574537 From roland at openjdk.org Mon Nov 4 12:27:33 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 4 Nov 2024 12:27:33 GMT Subject: RFR: 8343068: C2: CastX2P Ideal transformation not always applied [v2] In-Reply-To: References: <5k9Zq9E-2OJg0raPGHS1t_45N7QguwAToBfZiCfPjlY=.f41fd65a-7ac1-4886-a801-16753b864a2e@github.com> Message-ID: <04t-3boaqTZQNmXF3jXEvPgY-zS8oerxa99wbhvCdBg=.0ce7b0a6-d1d4-4cc0-948d-216bdc9ffbfa@github.com> On Mon, 28 Oct 2024 06:20:59 GMT, Tobias Hartmann wrote: > Shouldn't this be caught by `VerifyIterativeGVN` after [JDK-8298952 ](https://bugs.openjdk.org/browse/JDK-8298952)? That one only checks `Value`. In this case, the transformation that's not applied is performed by `Ideal`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21714#issuecomment-2454582028 From epeter at openjdk.org Mon Nov 4 13:00:43 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 4 Nov 2024 13:00:43 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v25] In-Reply-To: References: Message-ID: On Sun, 20 Oct 2024 16:41:19 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: > > - Merge branch 'master' into unsignedbounds > - address reviews > - comment adjust_lo empty case > - formality > - address reviews > - add comments, refactor functions to helper class > - refine comments > - remove leftover code > - add doc to TypeInt, rename parameters, remove unused methods > - change (~v & ones) == 0 to (v & ones) == ones > - ... and 22 more: https://git.openjdk.org/jdk/compare/309b9291...7f3316fa src/hotspot/share/opto/rangeinference.cpp line 30: > 28: #include "utilities/tuple.hpp" > 29: > 30: constexpr juint SMALLINT = 3; // a value too insignificant to consider widening If you are already refactoring this code, I'd suggest giving it a better name. Seems to have to do with cardinality...? src/hotspot/share/opto/type.cpp line 4690: > 4688: const Type* tm = _ary->meet_speculative(tap->_ary); > 4689: const TypeAry* tary = tm->isa_ary(); > 4690: if (tary == nullptr) { Can you add a comment why this might happen? src/hotspot/share/opto/type.hpp line 630: > 628: * For a TypeInt t, there are 3 possible cases: > 629: * > 630: * a. t._lo >= 0. Since 0 <= t._lo <= jint(t._ulo), we have: I think you should say why `t._lo <= jint(t._ulo)` ... it seems intuitively true... hmm src/hotspot/share/opto/type.hpp line 632: > 630: * a. t._lo >= 0. Since 0 <= t._lo <= jint(t._ulo), we have: > 631: * > 632: * juint(t._lo) <= juint(jint(t._ulo)) == t._ulo <= juint(t._lo) You should say what steps you are applying here... otherwise the reader has a lot to do. Lemma, return-cast, `t._lo <= jint(t._ulo)` (maybe its own Lemma2?) src/hotspot/share/opto/type.hpp line 634: > 632: * juint(t._lo) <= juint(jint(t._ulo)) == t._ulo <= juint(t._lo) > 633: * > 634: * Which means that t._lo == jint(t._ulo). Similarly, t._hi == jint(t._uhi). Hmm. I feel like I don't immediately see the "Similarly" here.... too many hidden steps. test/hotspot/gtest/opto/test_rangeinference.cpp line 33: > 31: #include > 32: > 33: #ifdef ASSERT Why do you have this here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1827691915 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1827665859 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1827673619 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1827676703 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1827680285 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1827684121 From epeter at openjdk.org Mon Nov 4 13:00:44 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 4 Nov 2024 13:00:44 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v25] In-Reply-To: References: Message-ID: <-d-R7jGoZ1OUrfIP23mumrC1L-WDQd3ylYoTf7TX6vs=.d83a9c38-c11b-4173-a1a9-bba2d691207a@github.com> On Mon, 4 Nov 2024 08:58:47 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: >> >> - Merge branch 'master' into unsignedbounds >> - address reviews >> - comment adjust_lo empty case >> - formality >> - address reviews >> - add comments, refactor functions to helper class >> - refine comments >> - remove leftover code >> - add doc to TypeInt, rename parameters, remove unused methods >> - change (~v & ones) == 0 to (v & ones) == ones >> - ... and 22 more: https://git.openjdk.org/jdk/compare/309b9291...7f3316fa > > src/hotspot/share/opto/type.hpp line 622: > >> 620: * TypeInt lie in the intervals [_lo, jint(_uhi)] or [jint(_ulo), _hi] >> 621: * >> 622: * Proof: For 2 jint value x, y such that they are both >= 0 or < 0. Then: > > Suggestion: > > * Proof: For 2 jint value x, y such that they are both >= 0 or both < 0. Then: > > Or are you allowing them to one be positive and one negative? Also: this is more of a "Lemma", and could be stated before the "Proof" of you property 2... it is property 2 that you are trying to prove here, right? The indentation would help for that as well. > src/hotspot/share/opto/type.hpp line 634: > >> 632: * juint(t._lo) <= juint(jint(t._ulo)) == t._ulo <= juint(t._lo) >> 633: * >> 634: * Which means that t._lo == jint(t._ulo). Similarly, t._hi == jint(t._uhi). > > Hmm. I feel like I don't immediately see the "Similarly" here.... too many hidden steps. I'm going to stop reading the proof below.... I'll read again once you respond to the comments above ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1827671624 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1827680976 From thartmann at openjdk.org Mon Nov 4 13:08:29 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 4 Nov 2024 13:08:29 GMT Subject: RFR: 8343068: C2: CastX2P Ideal transformation not always applied [v2] In-Reply-To: References: <5k9Zq9E-2OJg0raPGHS1t_45N7QguwAToBfZiCfPjlY=.f41fd65a-7ac1-4886-a801-16753b864a2e@github.com> Message-ID: On Fri, 25 Oct 2024 15:09:50 GMT, Roland Westrelin wrote: >> The transformation: >> >> >> (CastX2P (AddL base i)) -> (AddP (CastX2P base) i) >> >> >> when i fits in an int is not always applied: when the type of `i` is >> narrowed so it fits in an int, the `CastX2P` is not enqueued for >> igvn. This can get in the way of vectorization as shown by test case >> `test2`. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > fix test Ah right, I missed that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21714#issuecomment-2454667516 From mdoerr at openjdk.org Mon Nov 4 13:28:28 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 4 Nov 2024 13:28:28 GMT Subject: RFR: 8251926: PPC: Remove an unused variable in assembler_ppc.cpp In-Reply-To: References: Message-ID: <8azWlriUnVJwl6jZPUNBYlXz7GQVoWivFjf57lgDJuA=.0c9ed3af-ee54-4cd4-8740-02700a54737f@github.com> On Mon, 4 Nov 2024 12:17:34 GMT, Sorna Sarathi wrote: > This PR removes an unused variable from load_const_optimized function in assembler_ppc.cpp file. > > JBS Issue: [JDK-8251926](https://bugs.openjdk.org/browse/JDK-8251926) Looks good and trivial. Thanks for resolving this old issue. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21874#pullrequestreview-2413061019 From epeter at openjdk.org Mon Nov 4 13:28:41 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 4 Nov 2024 13:28:41 GMT Subject: RFR: 8342498: Add test for Allocation elimination after use as alignment reference by SuperWord Message-ID: <3jLdPTBJ5FF6j7XmU9qicyfHOiPLs8_NGHN-enTh4ak=.32de86ef-a76d-4e07-a8bb-749f6648219f@github.com> There used to be a bug where this happens: - SuperWord vectorizes, and picks a field-store as he alignment reference, using a CastP2X on he object pointer. - Later, all field loads disappear, and the Allocation of the object is eliminated. - The GC code then thinks the CastP2X is part of the GC barrier code... and crashes with wrong assumptions about that part of the IR. We should obviously not use field-accesses as alignment references for SuperWord. A few other changes have fixed this issue: - [JDK-8328544](https://bugs.openjdk.org/browse/JDK-8328544): it disallows any non-array accesses that do not have an int-index. This code was backported and so should on its own fix the issue everywhere. But maybe somebody has the idea and wants to be more smart... allowing such memory accesses without int-indices. For that we should add this regression test. // We did not find the int_index. Just to be safe, reject this VPointer. if (!_has_int_index_after_convI2L) { return false; } - Recently, we now only allow memory access to be alignment references if they are actually vectorized... which cannot happen with field stores. - Roberto's change with GC barriers also removed the asserting/crashing code. Though I'm not sure if that means the IR is then ok. **At any rate**: the bug seems **fixed**, but we should add and possibly backport this **regression test** anyway. ------------- Commit messages: - JDK-8342498 Changes: https://git.openjdk.org/jdk/pull/21875/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21875&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342498 Stats: 182 lines in 1 file changed: 182 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21875.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21875/head:pull/21875 PR: https://git.openjdk.org/jdk/pull/21875 From mdoerr at openjdk.org Mon Nov 4 13:35:30 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 4 Nov 2024 13:35:30 GMT Subject: RFR: 8343506: [s390x] multiple test failures with ubsan In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 05:27:59 GMT, Amit Kumar wrote: > This is trivial patch which fixes the error I am seeing on s390x, while running tier1 with ubsan enabled. Please see JBS for more details. I think the is_uimm* checks should take an `uint64_t`. See assembler_riscv.inline.hpp. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21864#issuecomment-2454727814 From chagedorn at openjdk.org Mon Nov 4 13:37:34 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 4 Nov 2024 13:37:34 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v6] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 04:14:47 GMT, Dhamoder Nalla wrote: >> In the debug build, the assert is triggered during the parsing (before Code_Gen). In the Release build, however, the compilation bails out at `Compile::check_node_count()` during the code generation phase and completes execution without any issues. >> >> When I commented out the assert(C->live_nodes() <= C->max_node_limit()), both the debug and release builds exhibited the same behavior: the compilation bails out during code_gen after building the ideal graph with more than 80K nodes. >> >> The proposed fix will check the live node count and bail out during compilation while building the graph for scalarization of the elements in the array when the live node count crosses the limit of 80K, instead of unnecessarily building the entire graph and bailing out in code_gen. > > Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: > > fix trailing whitespace src/hotspot/share/opto/escape.hpp line 680: > 678: bool add_final_edges_unsafe_access(Node* n, uint opcode); > 679: > 680: int invocation() { return _invocation; } Can be made `const`: Suggestion: int invocation() const { return _invocation; } test/hotspot/jtreg/compiler/loopopts/superword/TestScalarize_Bailout.java line 33: > 31: */ > 32: > 33: package compiler.loopopts.superword; I suggest to move this test to `compiler/escapeAnalysis` and update the package accordingly to `compiler.escapeAnalysis`. test/hotspot/jtreg/compiler/loopopts/superword/TestScalarize_Bailout.java line 35: > 33: package compiler.loopopts.superword; > 34: > 35: public class TestScalarize_Bailout { You should not use underlines in class names test/hotspot/jtreg/compiler/loopopts/superword/TestScalarize_Bailout.java line 37: > 35: public class TestScalarize_Bailout { > 36: > 37: static Object var1; Indentation seems to be off. test/hotspot/jtreg/compiler/loopopts/superword/TestScalarize_Bailout.java line 43: > 41: try { > 42: Class Class37 = Class.forName("compiler.loopopts.superword.TestScalarize_Bailout"); > 43: synchronized (compiler.loopopts.superword.TestScalarize_Bailout.class) { I guess you do not need the fully qualified name: Suggestion: synchronized (TestScalarize_Bailout.class) { test/hotspot/jtreg/compiler/loopopts/superword/TestScalarize_Bailout.java line 43: > 41: try { > 42: Class Class37 = Class.forName("compiler.loopopts.superword.TestScalarize_Bailout"); > 43: synchronized (compiler.loopopts.superword.TestScalarize_Bailout.class) { Is `forName()` and `synchronized` really required to trigger this with mainline? If so, you should add a comment to explain why this is required. test/hotspot/jtreg/compiler/loopopts/superword/TestScalarize_Bailout.java line 48: > 46: } > 47: } > 48: } catch (Exception eeeeeeee){throw new RuntimeException(eeeeeeee);} Suggestion: } catch (Exception e) { throw new RuntimeException(e); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20504#discussion_r1827660522 PR Review Comment: https://git.openjdk.org/jdk/pull/20504#discussion_r1827664200 PR Review Comment: https://git.openjdk.org/jdk/pull/20504#discussion_r1827662890 PR Review Comment: https://git.openjdk.org/jdk/pull/20504#discussion_r1827739836 PR Review Comment: https://git.openjdk.org/jdk/pull/20504#discussion_r1827665705 PR Review Comment: https://git.openjdk.org/jdk/pull/20504#discussion_r1827741280 PR Review Comment: https://git.openjdk.org/jdk/pull/20504#discussion_r1827666392 From chagedorn at openjdk.org Mon Nov 4 13:37:35 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 4 Nov 2024 13:37:35 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v4] In-Reply-To: <4oWQ5tScx2i8xp1XO-q7R-SczbUZT_Klq757GyFkmlY=.2907afac-3872-4ff6-88e0-2a05144ff21b@github.com> References: <4oWQ5tScx2i8xp1XO-q7R-SczbUZT_Klq757GyFkmlY=.2907afac-3872-4ff6-88e0-2a05144ff21b@github.com> Message-ID: On Thu, 24 Oct 2024 04:07:33 GMT, Dhamoder Nalla wrote: >> src/hotspot/share/opto/macro.cpp line 821: >> >>> 819: // If scalarize operation is adding too many nodes, bail out >>> 820: if (C->check_node_count(300, "out of nodes while scalarizing object")) { >>> 821: return nullptr; >> >> Would a bailout from this scalarization be enough or do we really require to record the method as non-compilable (which is done with `check_node_count()`? In the latter case, we could also try something like "recompilation without EA" as done, for example, here (i.e. `retry_no_escape_analysis`): >> >> https://github.com/openjdk/jdk/blob/37cfaa8deb4cc15864bb6dc2c8a87fc97cff2f0d/src/hotspot/share/opto/escape.cpp#L3858-L3866 >> >> I also suggest to use the `NodeLimitFudgeFactor` instead of `300` to have it controllable. > > Thank you for your suggestion @chhagedorn. I agree that 'recompilation without EA' makes more sense, and I have made the necessary changes. Okay thanks for investigating again. A bailout makes sense for this edge case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20504#discussion_r1827742596 From amitkumar at openjdk.org Mon Nov 4 13:44:32 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 4 Nov 2024 13:44:32 GMT Subject: RFR: 8343506: [s390x] multiple test failures with ubsan In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 13:32:30 GMT, Martin Doerr wrote: > I think the is_uimm* checks should take an `uint64_t`. See assembler_riscv.inline.hpp. But aren't `julong` same as `uint64_t` ? I saw this in `globalDefinitions.hpp` // Additional Java basic types typedef uint8_t jubyte; typedef uint16_t jushort; typedef uint32_t juint; typedef uint64_t julong; ------------- PR Comment: https://git.openjdk.org/jdk/pull/21864#issuecomment-2454749491 From roland at openjdk.org Mon Nov 4 13:44:33 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 4 Nov 2024 13:44:33 GMT Subject: RFR: 8341834: C2 compilation fails with "bad AD file" due to Replicate [v2] In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 06:54:43 GMT, Emanuel Peter wrote: > Do you know what JDK versions are affected? The failure doesn't reproduce with jdk21u. But that seems to be because we need JDK-8326139 (and JDK-8331575) for the bug to show up. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21660#issuecomment-2454748705 From mdoerr at openjdk.org Mon Nov 4 13:50:27 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 4 Nov 2024 13:50:27 GMT Subject: RFR: 8343506: [s390x] multiple test failures with ubsan In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 05:27:59 GMT, Amit Kumar wrote: > This is trivial patch which fixes the error I am seeing on s390x, while running tier1 with ubsan enabled. Please see JBS for more details. My point is that I think that the riscv solution is better. See assembler_riscv.inline.hpp. Your cast is correct, though. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21864#issuecomment-2454756323 PR Comment: https://git.openjdk.org/jdk/pull/21864#issuecomment-2454762183 From epeter at openjdk.org Mon Nov 4 14:00:14 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 4 Nov 2024 14:00:14 GMT Subject: RFR: 8342498: Add test for Allocation elimination after use as alignment reference by SuperWord [v2] In-Reply-To: <3jLdPTBJ5FF6j7XmU9qicyfHOiPLs8_NGHN-enTh4ak=.32de86ef-a76d-4e07-a8bb-749f6648219f@github.com> References: <3jLdPTBJ5FF6j7XmU9qicyfHOiPLs8_NGHN-enTh4ak=.32de86ef-a76d-4e07-a8bb-749f6648219f@github.com> Message-ID: > There used to be a bug where this happens: > - SuperWord vectorizes, and picks a field-store as he alignment reference, using a CastP2X on he object pointer. > - Later, all field loads disappear, and the Allocation of the object is eliminated. > - The GC code then thinks the CastP2X is part of the GC barrier code... and crashes with wrong assumptions about that part of the IR. > > We should obviously not use field-accesses as alignment references for SuperWord. A few other changes have fixed this issue: > - [JDK-8328544](https://bugs.openjdk.org/browse/JDK-8328544): it disallows any non-array accesses that do not have an int-index. This code was backported and so should on its own fix the issue everywhere. But maybe somebody has the idea and wants to be more smart... allowing such memory accesses without int-indices. For that we should add this regression test. > > // We did not find the int_index. Just to be safe, reject this VPointer. > if (!_has_int_index_after_convI2L) { > return false; > } > > - Recently, we now only allow memory access to be alignment references if they are actually vectorized... which cannot happen with field stores. > - Roberto's change with GC barriers also removed the asserting/crashing code. Though I'm not sure if that means the IR is then ok. > > **At any rate**: the bug seems **fixed**, but we should add and possibly backport this **regression test** anyway. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: unlock diagnostics ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21875/files - new: https://git.openjdk.org/jdk/pull/21875/files/4ddc14cc..4ff0aa27 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21875&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21875&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21875.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21875/head:pull/21875 PR: https://git.openjdk.org/jdk/pull/21875 From thartmann at openjdk.org Mon Nov 4 14:18:30 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 4 Nov 2024 14:18:30 GMT Subject: RFR: 8342498: Add test for Allocation elimination after use as alignment reference by SuperWord [v2] In-Reply-To: References: <3jLdPTBJ5FF6j7XmU9qicyfHOiPLs8_NGHN-enTh4ak=.32de86ef-a76d-4e07-a8bb-749f6648219f@github.com> Message-ID: <9r7WrmtNlOWqKxm3tPgUVprgW28KAuNx0cBc3mYVspY=.3ad5a8b7-1b44-45ec-b30c-3ce41b8e4d73@github.com> On Mon, 4 Nov 2024 14:00:14 GMT, Emanuel Peter wrote: >> There used to be a bug where this happens: >> - SuperWord vectorizes, and picks a field-store as he alignment reference, using a CastP2X on he object pointer. >> - Later, all field loads disappear, and the Allocation of the object is eliminated. >> - The GC code then thinks the CastP2X is part of the GC barrier code... and crashes with wrong assumptions about that part of the IR. >> >> We should obviously not use field-accesses as alignment references for SuperWord. A few other changes have fixed this issue: >> - [JDK-8328544](https://bugs.openjdk.org/browse/JDK-8328544): it disallows any non-array accesses that do not have an int-index. This code was backported and so should on its own fix the issue everywhere. But maybe somebody has the idea and wants to be more smart... allowing such memory accesses without int-indices. For that we should add this regression test. >> >> // We did not find the int_index. Just to be safe, reject this VPointer. >> if (!_has_int_index_after_convI2L) { >> return false; >> } >> >> - Recently, we now only allow memory access to be alignment references if they are actually vectorized... which cannot happen with field stores. >> - Roberto's change with GC barriers also removed the asserting/crashing code. Though I'm not sure if that means the IR is then ok. >> >> **At any rate**: the bug seems **fixed**, but we should add and possibly backport this **regression test** anyway. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > unlock diagnostics Great job extracting this test, Emanuel. Looks good to me! ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21875#pullrequestreview-2413188915 From amitkumar at openjdk.org Mon Nov 4 14:34:28 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 4 Nov 2024 14:34:28 GMT Subject: RFR: 8343506: [s390x] multiple test failures with ubsan In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 05:27:59 GMT, Amit Kumar wrote: > This is trivial patch which fixes the error I am seeing on s390x, while running tier1 with ubsan enabled. Please see JBS for more details. Oh, got it. I will add that change in the PR and ran tests again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21864#issuecomment-2454870939 From kvn at openjdk.org Mon Nov 4 16:12:30 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 4 Nov 2024 16:12:30 GMT Subject: RFR: 8342498: Add test for Allocation elimination after use as alignment reference by SuperWord [v2] In-Reply-To: References: <3jLdPTBJ5FF6j7XmU9qicyfHOiPLs8_NGHN-enTh4ak=.32de86ef-a76d-4e07-a8bb-749f6648219f@github.com> Message-ID: On Mon, 4 Nov 2024 14:00:14 GMT, Emanuel Peter wrote: >> There used to be a bug where this happens: >> - SuperWord vectorizes, and picks a field-store as he alignment reference, using a CastP2X on he object pointer. >> - Later, all field loads disappear, and the Allocation of the object is eliminated. >> - The GC code then thinks the CastP2X is part of the GC barrier code... and crashes with wrong assumptions about that part of the IR. >> >> We should obviously not use field-accesses as alignment references for SuperWord. A few other changes have fixed this issue: >> - [JDK-8328544](https://bugs.openjdk.org/browse/JDK-8328544): it disallows any non-array accesses that do not have an int-index. This code was backported and so should on its own fix the issue everywhere. But maybe somebody has the idea and wants to be more smart... allowing such memory accesses without int-indices. For that we should add this regression test. >> >> // We did not find the int_index. Just to be safe, reject this VPointer. >> if (!_has_int_index_after_convI2L) { >> return false; >> } >> >> - Recently, we now only allow memory access to be alignment references if they are actually vectorized... which cannot happen with field stores. >> - Roberto's change with GC barriers also removed the asserting/crashing code. Though I'm not sure if that means the IR is then ok. >> >> **At any rate**: the bug seems **fixed**, but we should add and possibly backport this **regression test** anyway. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > unlock diagnostics Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21875#pullrequestreview-2413492594 From kvn at openjdk.org Mon Nov 4 16:42:28 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 4 Nov 2024 16:42:28 GMT Subject: RFR: 8339303: C2: dead node after failing to match cloned address expression In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 13:53:33 GMT, Roberto Casta?eda Lozano wrote: > This changeset prevents the x86 platform-specific logic from cloning address expressions consisting of two chained `AddP` nodes with a small constant offset each, such as in the following example: > > ![example](https://github.com/user-attachments/assets/86c143a1-3895-4e0c-936b-0d22b7c80e73) > > Such patterns cannot be fully subsumed into x86 complex addressing modes, and cloning them can cause the matcher to introduce dead nodes that trigger a segmentation fault in the subsequent global code motion phase. See a detailed analysis of the failure in the [JBS issue description](https://bugs.openjdk.org/browse/JDK-8339303). > > The changeset additionally extends the post-matching verification logic to check that no old node is reachable by travesing both node inputs and outputs. This extension would have caused the original test case to fail directly after matching with an informative assertion message rather than an opaque segmentation fault in an unrelated code generation phase. > > Note that the pattern causing the failure should be in general optimized by `AddPNode::Ideal` into a single `AddP` node with the constant sum of the offsets. While [JDK-8343067](https://bugs.openjdk.org/browse/JDK-8343067) should address the missing optimization, this changeset proposes a complementary solution that is easily backportable and avoids relying on specific optimizations for correctness. > > #### Testing > > ##### Functionality > > - tier1-5 (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > ##### Performance > > - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. Looks good. Yes, it looks like code expect LShift here instead of constant. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21829#pullrequestreview-2413569797 From kvn at openjdk.org Mon Nov 4 16:20:31 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 4 Nov 2024 16:20:31 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v4] In-Reply-To: References: <4oWQ5tScx2i8xp1XO-q7R-SczbUZT_Klq757GyFkmlY=.2907afac-3872-4ff6-88e0-2a05144ff21b@github.com> Message-ID: <4t14KRimdrYG3dPJ4FgeeX0oz1xwGDNfuParbwVIL68=.ea5c0137-ffad-41f7-9ac0-e95daecf09ea@github.com> On Mon, 4 Nov 2024 13:34:56 GMT, Christian Hagedorn wrote: >> Thank you for your suggestion @chhagedorn. I agree that 'recompilation without EA' makes more sense, and I have made the necessary changes. > > Okay thanks for investigating again. A bailout makes sense for this edge case. Yes, bailout with recompilation is preferable. Graph could be already partially modified with some fields accesses nodes for scalaraized object. If bailout check and code is the same as in `escape.cpp` consider factoring it into one function to use in both places. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20504#discussion_r1828003943 From kvn at openjdk.org Mon Nov 4 15:54:35 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 4 Nov 2024 15:54:35 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v18] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Mon, 4 Nov 2024 11:48:49 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > more changes for Christian Update is good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19970#pullrequestreview-2413445139 From dfenacci at openjdk.org Mon Nov 4 15:08:47 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 4 Nov 2024 15:08:47 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure Message-ID: # Issue The `compiler/vectorapi/VectorLogicalOpIdentityTest.java` has been failing because C2 compiling the test `testAndMaskSameValue1` expects to have 1 `AndV` nodes but it has none. # Cause The issue has to do with the criteria that trigger a cleanup when performing late inlining. In the failing test, when the compiler tries to inline a `jdk.internal.vm.vector.VectorSupport::binaryOp` call, it fails because its argument is of the wrong type, mainly because some cast nodes ?hide? the more ?precise? type. The graph that leads to the issue looks like this: ![1BCE8148-1E44-4CA1-AF8F-EFC6210FA740](https://github.com/user-attachments/assets/62dd917f-2dac-42a9-90cf-73eedcd3cf8a) The compiler tries to inline `jdk.internal.vm.vector.VectorSupport::load` and it succeeds: ![752E81C9-A37D-4626-81A9-E4A839FADD3D](https://github.com/user-attachments/assets/e61057b2-3093-4992-ba5a-b80e4000c0ec) The node `3027 VectorBox` has type `IntMaxVector`. `912 CastPP` and `934 CheckCastPP` have type `IntVector`instead. The compiler then tries to inline one of the 2 `bynaryOp` calls but it fails because it needs an argument of type `IntMaxVector` and the argument it is given, which is node `934 CheckCastPP` , has type `IntVector`. This would not happen if between the 2 inlining attempts a _cleanup_ was triggered. IGVN would run and the 2 nodes `912 CastPP` and `934 CheckCastPP` would be folded away. `binaryOp` could then be inlined since the types would match. # Solution In order to fix this an extra cleanup has to be performed when we encounter a situation like the one above, i.e. when late inlining creates a `VectorBox`. Additional test runs with `-XX:-TieredCompilation` are added to `VectorLogicalOpIdentityTest.java` and `VectorGatherMaskFoldingTest.java` as regression tests and `-XX:+IncrementalInlineForceCleanup` is removed from `VectorGatherMaskFoldingTest.java` (previously added as workaround for this issue) ------------- Commit messages: - Merge branch 'master' into JDK-8302459-new - JDK-8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure - Revert "JDK-8302459: compiler/vectorapi/VectorLogicalOpIdentityTest.java failed with "IRViolationException: There were one or multiple IR rule failures"" - Revert "JDK-8302459: remove unused vector inline queue" - Revert "JDK-8302459: remove unneeded changes" - Revert "JDK-8302459: remove unneeded function declaration" - Revert "JDK-8302459: add explicit -TieredCompilation to tests" - Revert "JDK-8302459: add bug numbers to tests" - Revert "JDK-8302459: update copyright year" - JDK-8302459: update copyright year - ... and 6 more: https://git.openjdk.org/jdk/compare/388d44fb...bd488a96 Changes: https://git.openjdk.org/jdk/pull/21682/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21682&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8302459 Stats: 13 lines in 4 files changed: 6 ins; 1 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/21682.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21682/head:pull/21682 PR: https://git.openjdk.org/jdk/pull/21682 From sparasa at openjdk.org Mon Nov 4 18:10:47 2024 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 4 Nov 2024 18:10:47 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support [v5] In-Reply-To: References: Message-ID: > The goal of this PR is to fix instruction encoding errors for some of the instructions which support the APX New Data Destination (NDD) and No Flags (NF) features (added in [JDK-8329035](https://bugs.openjdk.org/browse/JDK-8329035)) > > The correctness of the encoding is verified by comparing to the reference encoding by GCC using an automated tool (https://github.com/openjdk/jdk/pull/21795) Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: remove map4 enum; replace with comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21770/files - new: https://git.openjdk.org/jdk/pull/21770/files/0f404dbd..1563aa2c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21770&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21770&range=03-04 Stats: 22 lines in 2 files changed: 0 ins; 1 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/21770.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21770/head:pull/21770 PR: https://git.openjdk.org/jdk/pull/21770 From sparasa at openjdk.org Mon Nov 4 18:14:29 2024 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 4 Nov 2024 18:14:29 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support [v5] In-Reply-To: <9lYvrMFQKW4Cq9zKFDjkNg9zKgLvdaCtVQTZgUGgpNs=.78ee059a-5f85-4d6e-828a-a71351d44763@github.com> References: <9lYvrMFQKW4Cq9zKFDjkNg9zKgLvdaCtVQTZgUGgpNs=.78ee059a-5f85-4d6e-828a-a71351d44763@github.com> Message-ID: On Wed, 30 Oct 2024 09:47:05 GMT, Jatin Bhateja wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> remove map4 enum; replace with comment > > I think we should first check-in extended gtest asm validation script detecting these issues either before or along with this patch. Hi @jatin-bhateja, please see the updated code indicating the MAP4 comment next to the VEX_OPCODE. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21770#issuecomment-2455389911 From mli at openjdk.org Mon Nov 4 18:37:37 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 4 Nov 2024 18:37:37 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option Message-ID: Hi, Can you help to review this simple patch? Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. Thanks ------------- Commit messages: - Initial commit - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Revert "initial commit" - initial commit Changes: https://git.openjdk.org/jdk/pull/21885/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21885&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343555 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21885.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21885/head:pull/21885 PR: https://git.openjdk.org/jdk/pull/21885 From dlong at openjdk.org Mon Nov 4 20:52:28 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 4 Nov 2024 20:52:28 GMT Subject: RFR: 8343506: [s390x] multiple test failures with ubsan In-Reply-To: References: Message-ID: <-_6fxDHjNO3eG0JX_Nqscc0bVUQgLcmK5OjoVME7xNk=.6ae3a749-36a4-47d2-9ff1-a29c28f97dd8@github.com> On Mon, 4 Nov 2024 05:27:59 GMT, Amit Kumar wrote: > This is trivial patch which fixes the error I am seeing on s390x, while running tier1 with ubsan enabled. Please see JBS for more details. src/hotspot/cpu/s390/s390.ad line 2550: > 2548: // Unsigned Integer Immediate: 9-bit > 2549: operand SSlenDW() %{ > 2550: predicate(Immediate::is_uimm8((julong)n->get_long()-1)); Suggestion: predicate(n->get_long() >= 1 && Immediate::is_uimm8((julong)n->get_long()-1)); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21864#discussion_r1828368759 From dlong at openjdk.org Mon Nov 4 21:06:27 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 4 Nov 2024 21:06:27 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 13:31:37 GMT, Damon Fenacci wrote: > # Issue > > The `compiler/vectorapi/VectorLogicalOpIdentityTest.java` has been failing because C2 compiling the test `testAndMaskSameValue1` expects to have 1 `AndV` nodes but it has none. > > # Cause > > The issue has to do with the criteria that trigger a cleanup when performing late inlining. In the failing test, when the compiler tries to inline a `jdk.internal.vm.vector.VectorSupport::binaryOp` call, it fails because its argument is of the wrong type, mainly because some cast nodes ?hide? the more ?precise? type. > The graph that leads to the issue looks like this: > ![1BCE8148-1E44-4CA1-AF8F-EFC6210FA740](https://github.com/user-attachments/assets/62dd917f-2dac-42a9-90cf-73eedcd3cf8a) > The compiler tries to inline `jdk.internal.vm.vector.VectorSupport::load` and it succeeds: > ![752E81C9-A37D-4626-81A9-E4A839FADD3D](https://github.com/user-attachments/assets/e61057b2-3093-4992-ba5a-b80e4000c0ec) > The node `3027 VectorBox` has type `IntMaxVector`. `912 CastPP` and `934 CheckCastPP` have type `IntVector`instead. > The compiler then tries to inline one of the 2 `bynaryOp` calls but it fails because it needs an argument of type `IntMaxVector` and the argument it is given, which is node `934 CheckCastPP` , has type `IntVector`. > > This would not happen if between the 2 inlining attempts a _cleanup_ was triggered. IGVN would run and the 2 nodes `912 CastPP` and `934 CheckCastPP` would be folded away. `binaryOp` could then be inlined since the types would match. > > # Solution > > In order to fix this an extra cleanup has to be performed when we encounter a situation like the one above, i.e. when late inlining creates a `VectorBox`. > > Additional test runs with `-XX:-TieredCompilation` are added to `VectorLogicalOpIdentityTest.java` and `VectorGatherMaskFoldingTest.java` as regression tests and `-XX:+IncrementalInlineForceCleanup` is removed from `VectorGatherMaskFoldingTest.java` (previously added as workaround for this issue) Would it be better to trigger cleanup based on the presence of nodes like CastPP/CheckCastPP instead? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21682#issuecomment-2455697290 From sviswanathan at openjdk.org Mon Nov 4 21:33:38 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 4 Nov 2024 21:33:38 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support [v5] In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 18:10:47 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to fix instruction encoding errors for some of the instructions which support the APX New Data Destination (NDD) and No Flags (NF) features (added in [JDK-8329035](https://bugs.openjdk.org/browse/JDK-8329035)) >> >> The correctness of the encoding is verified by comparing to the reference encoding by GCC using an automated tool (https://github.com/openjdk/jdk/pull/21795) > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > remove map4 enum; replace with comment src/hotspot/cpu/x86/assembler_x86.cpp line 2637: > 2635: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_32bit); > 2636: // NDD shares its encoding bits with NDS bits for regular EVEX instruction. > 2637: // Therefore, DST is passed as the second argument to minimize changes in the leaf level routine. dst is not the second argument here so the comment can be removed. src/hotspot/cpu/x86/assembler_x86.cpp line 14858: > 14856: InstructionAttr attributes(AVX_128bit, /* vex_w */ true, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 14857: // NDD shares its encoding bits with NDS bits for regular EVEX instruction. > 14858: // Therefore, DST is passed as the second argument to minimize changes in the leaf level routine. dst is not the second argument here so the comment can be removed. src/hotspot/cpu/x86/assembler_x86.cpp line 14880: > 14878: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); > 14879: // NDD shares its encoding bits with NDS bits for regular EVEX instruction. > 14880: // Therefore, DST is passed as the second argument to minimize changes in the leaf level routine. dst is not the second argument here so the comment can be removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21770#discussion_r1828415495 PR Review Comment: https://git.openjdk.org/jdk/pull/21770#discussion_r1828414929 PR Review Comment: https://git.openjdk.org/jdk/pull/21770#discussion_r1828414685 From sparasa at openjdk.org Mon Nov 4 21:59:05 2024 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 4 Nov 2024 21:59:05 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support [v6] In-Reply-To: References: Message-ID: > The goal of this PR is to fix instruction encoding errors for some of the instructions which support the APX New Data Destination (NDD) and No Flags (NF) features (added in [JDK-8329035](https://bugs.openjdk.org/browse/JDK-8329035)) > > The correctness of the encoding is verified by comparing to the reference encoding by GCC using an automated tool (https://github.com/openjdk/jdk/pull/21795) Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: remove comment where not required ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21770/files - new: https://git.openjdk.org/jdk/pull/21770/files/1563aa2c..fcc782b2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21770&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21770&range=04-05 Stats: 6 lines in 1 file changed: 0 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21770.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21770/head:pull/21770 PR: https://git.openjdk.org/jdk/pull/21770 From dlong at openjdk.org Mon Nov 4 22:36:34 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 4 Nov 2024 22:36:34 GMT Subject: RFR: 8343206: Final graph reshaping should not compress abstract or interface class pointers [v3] In-Reply-To: References: <4_84pZqk5-pV1iTUdpf5wmVczTdHq-9-Re1qjbGU7Eo=.0fb46e18-883f-45f8-827d-567602373431@github.com> Message-ID: On Mon, 4 Nov 2024 06:26:04 GMT, Tobias Hartmann wrote: >> Do we actually generate an nmethod for the above example? It seems like it could never execute the getClass() because the line above setting `obj` would have to throw an exception if there can be no concrete instances. > > Right, this was an oversimplified example. I used this code: > > Class test(MyAbstract obj, boolean b) { > if (b) { > return obj.getClass(); > } > return null; > } > > > We pass `null` for `obj` and `false` for `b`. Usually, the branch is then only compiled with Xcomp. I think there is still hope for moving the assert into `TypeNarrowKlass::make` in a future RFE. In the example above, if we are generating code for obj.getClass() based on the assumption that the type is a leaf, we could also notice that the type is abstract and deduce that obj must be null. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21784#discussion_r1828483501 From vlivanov at openjdk.org Mon Nov 4 23:01:28 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 4 Nov 2024 23:01:28 GMT Subject: RFR: 8339303: C2: dead node after failing to match cloned address expression In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 13:53:33 GMT, Roberto Casta?eda Lozano wrote: > This changeset prevents the x86 platform-specific logic from cloning address expressions consisting of two chained `AddP` nodes with a small constant offset each, such as in the following example: > > ![example](https://github.com/user-attachments/assets/86c143a1-3895-4e0c-936b-0d22b7c80e73) > > Such patterns cannot be fully subsumed into x86 complex addressing modes, and cloning them can cause the matcher to introduce dead nodes that trigger a segmentation fault in the subsequent global code motion phase. See a detailed analysis of the failure in the [JBS issue description](https://bugs.openjdk.org/browse/JDK-8339303). > > The changeset additionally extends the post-matching verification logic to check that no old node is reachable by travesing both node inputs and outputs. This extension would have caused the original test case to fail directly after matching with an informative assertion message rather than an opaque segmentation fault in an unrelated code generation phase. > > Note that the pattern causing the failure should be in general optimized by `AddPNode::Ideal` into a single `AddP` node with the constant sum of the offsets. While [JDK-8343067](https://bugs.openjdk.org/browse/JDK-8343067) should address the missing optimization, this changeset proposes a complementary solution that is easily backportable and avoids relying on specific optimizations for correctness. > > #### Testing > > ##### Functionality > > - tier1-5 (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > ##### Performance > > - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. Looks good. src/hotspot/share/opto/matcher.cpp line 183: > 181: } > 182: } > 183: for (uint j = 0; j < n->outcnt(); j++) { Why don't you use DU iterator instead (e.g., `DUIterator_Fast`)? ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21829#pullrequestreview-2414301481 PR Review Comment: https://git.openjdk.org/jdk/pull/21829#discussion_r1828505551 From fyang at openjdk.org Tue Nov 5 00:45:27 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 5 Nov 2024 00:45:27 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 18:33:00 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. > Thanks Thanks. That make sense to me. Since we are having more and more RISC-V extensions, we should rely on linux hwprobe syscall for auto detection and enablement them in the long run. Seems that we should also similarly handle other ones like `UseRVC`, `UseRVV`, etc. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21885#pullrequestreview-2414414743 From amitkumar at openjdk.org Tue Nov 5 06:08:30 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 5 Nov 2024 06:08:30 GMT Subject: RFR: 8251926: PPC: Remove an unused variable in assembler_ppc.cpp In-Reply-To: References: Message-ID: <9xZUzHxNV1awugjBCXBaH0NZUXC37yJhHDt6yNohaBM=.dfec0e3e-6d08-4dca-a529-9939c2d5aaf2@github.com> On Mon, 4 Nov 2024 12:17:34 GMT, Sorna Sarathi wrote: > This PR removes an unused variable from load_const_optimized function in assembler_ppc.cpp file. > > JBS Issue: [JDK-8251926](https://bugs.openjdk.org/browse/JDK-8251926) I think commands in `edited` section does not work with bots. You can pass integrate command again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21874#issuecomment-2456308779 From duke at openjdk.org Tue Nov 5 06:08:31 2024 From: duke at openjdk.org (duke) Date: Tue, 5 Nov 2024 06:08:31 GMT Subject: RFR: 8251926: PPC: Remove an unused variable in assembler_ppc.cpp In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 12:17:34 GMT, Sorna Sarathi wrote: > This PR removes an unused variable from load_const_optimized function in assembler_ppc.cpp file. > > JBS Issue: [JDK-8251926](https://bugs.openjdk.org/browse/JDK-8251926) @Sorna-Sarathi Your change (at version 8e16c9eeae76e306490dbbe389e0c6ccba64f5b3) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21874#issuecomment-2456310012 From duke at openjdk.org Tue Nov 5 06:11:34 2024 From: duke at openjdk.org (Sorna Sarathi) Date: Tue, 5 Nov 2024 06:11:34 GMT Subject: Integrated: 8251926: PPC: Remove an unused variable in assembler_ppc.cpp In-Reply-To: References: Message-ID: <3xeUYA5NCN38addWkH65IEGodG5CXl9lNyiRyvJ2Mt4=.6ca7e78a-8303-4a87-8c7f-f08142ccbe8d@github.com> On Mon, 4 Nov 2024 12:17:34 GMT, Sorna Sarathi wrote: > This PR removes an unused variable from load_const_optimized function in assembler_ppc.cpp file. > > JBS Issue: [JDK-8251926](https://bugs.openjdk.org/browse/JDK-8251926) This pull request has now been integrated. Changeset: 0f7dd98d Author: Sorna Sarathi Committer: Amit Kumar URL: https://git.openjdk.org/jdk/commit/0f7dd98d9d546e0fc2c7b1df779cef35e5b5852c Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod 8251926: PPC: Remove an unused variable in assembler_ppc.cpp Reviewed-by: mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/21874 From jbhateja at openjdk.org Tue Nov 5 07:05:28 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 5 Nov 2024 07:05:28 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support [v6] In-Reply-To: <9lYvrMFQKW4Cq9zKFDjkNg9zKgLvdaCtVQTZgUGgpNs=.78ee059a-5f85-4d6e-828a-a71351d44763@github.com> References: <9lYvrMFQKW4Cq9zKFDjkNg9zKgLvdaCtVQTZgUGgpNs=.78ee059a-5f85-4d6e-828a-a71351d44763@github.com> Message-ID: On Wed, 30 Oct 2024 09:47:05 GMT, Jatin Bhateja wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> remove comment where not required > > I think we should first check-in extended gtest asm validation script detecting these issues either before or along with this patch. > Hi @jatin-bhateja, please see the updated code indicating the MAP4 comment next to the VEX_OPCODE. The specification does not mention using 0F_3C for MAP4. I guess we are trying to be compatible with the GCC encoding scheme here. Adding MAP4 in the comments is still better. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21770#issuecomment-2456386026 From rehn at openjdk.org Tue Nov 5 08:19:31 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 5 Nov 2024 08:19:31 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 18:33:00 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. > Thanks Do it really makes sense to have instruction set selection diagnostic: https://github.com/openjdk/jdk/blob/dafa2e55adb6b054c342d5e723e51087d771e6d6/src/hotspot/share/runtime/globals.hpp#L59 ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21885#issuecomment-2456509266 From thartmann at openjdk.org Tue Nov 5 08:59:40 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 5 Nov 2024 08:59:40 GMT Subject: RFR: 8340454: C2 EA asserts with "previous reducible Phi is no longer reducible before SUT" [v3] In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 21:53:45 GMT, Cesar Soares Lucas wrote: >> Please, consider this patch to fix an issue that happens when a Phi previously considered reducible become later irreducible. The overall situation that causes the problem is like so: >> >> - Consider that there are at least 2 scalar replaceable objects (Obj1 and Obj2; Obj2 is stored in a field of Obj1) when we start iterating the loop at escape.cpp:301 >> >> - In the first iteration of the loop the call chain starting with `adjust_scalar_replaceable_state` ends up calling `can_reduce_phi` and considering Phi1 as reducible. This Phi has only Obj1 as *SR* input. >> >> - In another iteration of the loop Obj2 is flagged as NSR. For instance, because we are storing Obj2 in an unknown position of an array. This will cause `found_nsr_alloc` to be set to `true`. >> >> After the loop finishes, the execution will go to `find_scalar_replaceable_allocs`. The code will process Obj1, because it's still scalar replaceable, but will find that this object is stored in a field of a - **now** - NSR object. Therefore, correctly, Obj1 will also be marked as NSR. When Obj1 is marked as NSR Phi1 becomes irreducible because it doesn't have any more scalar replaceable input. >> >> The solution I'm proposing is simply revisit the "reducibility" of the Phis when an object is marked as NSR. >> >> --------- >> >> ### Tests >> >> Win, Mac & Linux tier1-4 on x64 & Aarch64. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Address PR feedback: include test execution options. All green. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21778#issuecomment-2456591564 From roland at openjdk.org Tue Nov 5 09:07:30 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 5 Nov 2024 09:07:30 GMT Subject: RFR: 8342943: Replace predicate walking and cloning code for main/post loops with a predicate visitor In-Reply-To: <5krLcDigRN94zhl5T8zX33E_7qyajZH96Xv-1biYGOc=.860f4223-44fa-4904-ae70-dafe74b2989c@github.com> References: <5krLcDigRN94zhl5T8zX33E_7qyajZH96Xv-1biYGOc=.860f4223-44fa-4904-ae70-dafe74b2989c@github.com> Message-ID: On Wed, 30 Oct 2024 15:18:56 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and post loop (this PR) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): Loop Unswitching and removing useless Assertion Predicates (upcoming) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (upcoming) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. This limitation should eventually be removed. But I want to do that separately at a later point. > --- > > #### Refactorings of this Patch > This patch replaces the predicate walking and cloning code for **main and post loops**. The code can reuse the code established w... Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21790#pullrequestreview-2415019248 From chagedorn at openjdk.org Tue Nov 5 09:19:34 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 5 Nov 2024 09:19:34 GMT Subject: RFR: 8342943: Replace predicate walking and cloning code for main/post loops with a predicate visitor In-Reply-To: <5krLcDigRN94zhl5T8zX33E_7qyajZH96Xv-1biYGOc=.860f4223-44fa-4904-ae70-dafe74b2989c@github.com> References: <5krLcDigRN94zhl5T8zX33E_7qyajZH96Xv-1biYGOc=.860f4223-44fa-4904-ae70-dafe74b2989c@github.com> Message-ID: On Wed, 30 Oct 2024 15:18:56 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and post loop (this PR) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): Loop Unswitching and removing useless Assertion Predicates (upcoming) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (upcoming) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. This limitation should eventually be removed. But I want to do that separately at a later point. > --- > > #### Refactorings of this Patch > This patch replaces the predicate walking and cloning code for **main and post loops**. The code can reuse the code established w... Thanks Roland for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21790#issuecomment-2456636768 From mli at openjdk.org Tue Nov 5 09:57:27 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 5 Nov 2024 09:57:27 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option In-Reply-To: References: Message-ID: <3XREX0qVwN4xX_REogr_hGZNjlp_VVeov9uAhrf_9Bg=.f5ab6614-16b4-4271-9f15-19a731f0385e@github.com> On Tue, 5 Nov 2024 08:16:59 GMT, Robbin Ehn wrote: > Do it really makes sense to have instruction set selection diagnostic: Do you suggest to keep it as Product or Experimental? The full sentences are as below: // DIAGNOSTIC options are not meant for VM tuning or for product modes. // They are to be used for VM quality assurance or field diagnosis // of VM bugs. They are hidden so that users will not be encouraged to // try them as if they were VM ordinary execution options. However, they // are available in the product version of the VM. Under instruction // from support engineers, VM customers can turn them on to collect // diagnostic information about VM problems. I think it should not be Experimental anymore, and seems it's better than Product, and can be used in product (`However, they are available in the product version of the VM`). But I'm not quite sure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21885#issuecomment-2456723371 From mdoerr at openjdk.org Tue Nov 5 10:07:35 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 5 Nov 2024 10:07:35 GMT Subject: RFR: 8343506: [s390x] multiple test failures with ubsan In-Reply-To: <-_6fxDHjNO3eG0JX_Nqscc0bVUQgLcmK5OjoVME7xNk=.6ae3a749-36a4-47d2-9ff1-a29c28f97dd8@github.com> References: <-_6fxDHjNO3eG0JX_Nqscc0bVUQgLcmK5OjoVME7xNk=.6ae3a749-36a4-47d2-9ff1-a29c28f97dd8@github.com> Message-ID: On Mon, 4 Nov 2024 20:49:39 GMT, Dean Long wrote: >> This is trivial patch which fixes the error I am seeing on s390x, while running tier1 with ubsan enabled. Please see JBS for more details. > > src/hotspot/cpu/s390/s390.ad line 2550: > >> 2548: // Unsigned Integer Immediate: 9-bit >> 2549: operand SSlenDW() %{ >> 2550: predicate(Immediate::is_uimm8((julong)n->get_long()-1)); > > Suggestion: > > predicate(n->get_long() >= 1 && Immediate::is_uimm8((julong)n->get_long()-1)); I don't think this is necessary. Unsigned subtraction with wrap-around is not undefined behavior. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21864#discussion_r1829071884 From fyang at openjdk.org Tue Nov 5 10:08:27 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 5 Nov 2024 10:08:27 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option In-Reply-To: <3XREX0qVwN4xX_REogr_hGZNjlp_VVeov9uAhrf_9Bg=.f5ab6614-16b4-4271-9f15-19a731f0385e@github.com> References: <3XREX0qVwN4xX_REogr_hGZNjlp_VVeov9uAhrf_9Bg=.f5ab6614-16b4-4271-9f15-19a731f0385e@github.com> Message-ID: On Tue, 5 Nov 2024 09:54:42 GMT, Hamlin Li wrote: > Do it really makes sense to have instruction set selection diagnostic: This was once discussed somewhere else before. Again, here is what I am thinking. First of all, we might don't want to expose these options for our end users. You will need to add to the release note for newly-added product options. There are quite a few for now and I suppose there will be more and more to come. So it's more reasonable to me to delegate to hwprobe. But if we do that, we still need a way to diagnostic or disable them when issues come (whether performance or functionality related). I don't see a better solution than making them DIAGNOSTIC ones. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21885#issuecomment-2456745978 From thartmann at openjdk.org Tue Nov 5 10:44:33 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 5 Nov 2024 10:44:33 GMT Subject: RFR: 8343206: Final graph reshaping should not compress abstract or interface class pointers [v3] In-Reply-To: References: <4_84pZqk5-pV1iTUdpf5wmVczTdHq-9-Re1qjbGU7Eo=.0fb46e18-883f-45f8-827d-567602373431@github.com> Message-ID: On Mon, 4 Nov 2024 22:34:04 GMT, Dean Long wrote: >> Right, this was an oversimplified example. I used this code: >> >> Class test(MyAbstract obj, boolean b) { >> if (b) { >> return obj.getClass(); >> } >> return null; >> } >> >> >> We pass `null` for `obj` and `false` for `b`. Usually, the branch is then only compiled with Xcomp. > > I think there is still hope for moving the assert into `TypeNarrowKlass::make` in a future RFE. In the example above, if we are generating code for obj.getClass() based on the assumption that the type is a leaf, we could also notice that the type is abstract and deduce that obj must be null. Right, we could do that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21784#discussion_r1829127985 From rehn at openjdk.org Tue Nov 5 10:51:28 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 5 Nov 2024 10:51:28 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 18:33:00 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. > Thanks Now we have normal and exprimental. Changing this one we would have also diagnostic. And some we get from hwprobe and some not, it's not easy to know which ones to manually turn on, etc... Wouldn't it make more sense to turn all options which may be enabled hwprobe to diagnostic instead? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21885#issuecomment-2456842731 From epeter at openjdk.org Tue Nov 5 11:49:45 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 5 Nov 2024 11:49:45 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v17] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Mon, 4 Nov 2024 11:27:17 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update test/hotspot/jtreg/compiler/c2/TestMergeStoresMemorySegment.java >> >> Co-authored-by: Christian Hagedorn >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn > >> /contributor add chhagedorn >> >> You spent enough time on this already ;) > > Thanks Emanuel, I highly appreciate that :-) Thanks @chhagedorn for the extensive reviews and collaboration on improving the proofs ? Thanks @vnkozlov for the approval. I did an offline merge and testing (to avoid requiring a re-approval) - all looks good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2456955481 From epeter at openjdk.org Tue Nov 5 11:49:47 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 5 Nov 2024 11:49:47 GMT Subject: Integrated: 8335392: C2 MergeStores: enhanced pointer parsing In-Reply-To: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Mon, 1 Jul 2024 13:32:01 GMT, Emanuel Peter wrote: > **Background** > I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. > > **Details** > > The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. > > This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. > > More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! > > `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. > > **What this change enables** > > Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). > > Now we can do: > - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. > - Merging `Unsafe` stores to native memory. > - Merging `MemorySegment`: with array, native, ByteBuffer backing types. > - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. > > **Dealing with Overflows** > > We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. > > **Bench... This pull request has now been integrated. Changeset: f3671bee Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/f3671beefb3ff07441a905e25619f0d1a0a2fe15 Stats: 2687 lines in 16 files changed: 2417 ins; 212 del; 58 mod 8335392: C2 MergeStores: enhanced pointer parsing Co-authored-by: Christian Hagedorn Reviewed-by: kvn, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/19970 From epeter at openjdk.org Tue Nov 5 11:50:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 5 Nov 2024 11:50:36 GMT Subject: RFR: 8342498: Add test for Allocation elimination after use as alignment reference by SuperWord [v2] In-Reply-To: References: <3jLdPTBJ5FF6j7XmU9qicyfHOiPLs8_NGHN-enTh4ak=.32de86ef-a76d-4e07-a8bb-749f6648219f@github.com> Message-ID: On Mon, 4 Nov 2024 16:10:00 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> unlock diagnostics > > Good. Thanks @vnkozlov for the approval. Thanks @TobiHartmann for the review and all the helpful suggestions along the way! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21875#issuecomment-2456957356 From epeter at openjdk.org Tue Nov 5 11:50:37 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 5 Nov 2024 11:50:37 GMT Subject: Integrated: 8342498: Add test for Allocation elimination after use as alignment reference by SuperWord In-Reply-To: <3jLdPTBJ5FF6j7XmU9qicyfHOiPLs8_NGHN-enTh4ak=.32de86ef-a76d-4e07-a8bb-749f6648219f@github.com> References: <3jLdPTBJ5FF6j7XmU9qicyfHOiPLs8_NGHN-enTh4ak=.32de86ef-a76d-4e07-a8bb-749f6648219f@github.com> Message-ID: On Mon, 4 Nov 2024 13:13:58 GMT, Emanuel Peter wrote: > There used to be a bug where this happens: > - SuperWord vectorizes, and picks a field-store as he alignment reference, using a CastP2X on he object pointer. > - Later, all field loads disappear, and the Allocation of the object is eliminated. > - The GC code then thinks the CastP2X is part of the GC barrier code... and crashes with wrong assumptions about that part of the IR. > > We should obviously not use field-accesses as alignment references for SuperWord. A few other changes have fixed this issue: > - [JDK-8328544](https://bugs.openjdk.org/browse/JDK-8328544): it disallows any non-array accesses that do not have an int-index. This code was backported and so should on its own fix the issue everywhere. But maybe somebody has the idea and wants to be more smart... allowing such memory accesses without int-indices. For that we should add this regression test. > > // We did not find the int_index. Just to be safe, reject this VPointer. > if (!_has_int_index_after_convI2L) { > return false; > } > > - Recently, we now only allow memory access to be alignment references if they are actually vectorized... which cannot happen with field stores. > - Roberto's change with GC barriers also removed the asserting/crashing code. Though I'm not sure if that means the IR is then ok. > > **At any rate**: the bug seems **fixed**, but we should add and possibly backport this **regression test** anyway. This pull request has now been integrated. Changeset: f62fc484 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/f62fc4844125cc20a91dc2be39ba05a2d3aca8cf Stats: 183 lines in 1 file changed: 183 ins; 0 del; 0 mod 8342498: Add test for Allocation elimination after use as alignment reference by SuperWord Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21875 From mli at openjdk.org Tue Nov 5 11:57:33 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 5 Nov 2024 11:57:33 GMT Subject: RFR: 8343070: Enable is_trace_align_vector when TraceSuperWord is set In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 07:08:39 GMT, Emanuel Peter wrote: >> Hi, >> Can you help to review this simple patch? >> Currently, in SuperWord::filter_packs_for_alignment(), there is some log not turned on when TraceSuperWord is set, but I think it should, as it's more convenient for users to debug. >> Thanks! > > Ah, one more thing. I try to keep `SW_INFO` and `TraceSuperWord` in sync. So if we do decide to add `ALIGN_VECTOR` to `TraceSuperWord`, we should also add it to `SW_INFO`. @eme64 Thanks for the information. At first my thought was to make it easier to debug SLP process, as I observed some tests failure on riscv, but it won't print out the detailed failure reason. This pr made it easy to do so, but at the same time also introduce some verbose information unconditionally, which is not useful when there is no failure/rejection or when user don't care about it. I don't find another more reasonable way to modify current log in SLP, I'll use `-XX:CompileCommand=TraceAutoVectorization,*::*,ALIGN_VECTOR` instead, and close this pr. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21715#issuecomment-2456973340 From mli at openjdk.org Tue Nov 5 11:57:34 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 5 Nov 2024 11:57:34 GMT Subject: Withdrawn: 8343070: Enable is_trace_align_vector when TraceSuperWord is set In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 14:45:33 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Currently, in SuperWord::filter_packs_for_alignment(), there is some log not turned on when TraceSuperWord is set, but I think it should, as it's more convenient for users to debug. > Thanks! This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/21715 From duke at openjdk.org Tue Nov 5 12:08:36 2024 From: duke at openjdk.org (Benoit Daloze) Date: Tue, 5 Nov 2024 12:08:36 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v3] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 06:05:15 GMT, Doug Simon wrote: >> [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in `-Xcomp` or `-Xbatch` mode. >> >> However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). >> >> This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > rename changeCompilerThreadCanCallJava to updateCompilerThreadCanCallJava Link: https://github.com/openjdk/jdk/pull/21285 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21171#issuecomment-2456996412 From mli at openjdk.org Tue Nov 5 12:09:28 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 5 Nov 2024 12:09:28 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option In-Reply-To: References: Message-ID: <-wQ1pBffv50RX52DhWCnrFt4eSUrd1biyyCt2LpbUg4=.ce604df2-3371-4dbf-ad7c-a2752c5c047c@github.com> On Tue, 5 Nov 2024 10:48:25 GMT, Robbin Ehn wrote: >> Hi, >> Can you help to review this simple patch? >> Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. >> Thanks > > Now we have normal and exprimental. Changing this one we would have also diagnostic. > And some we get from hwprobe and some not, it's not easy to know which ones to manually turn on, etc... > Wouldn't it make more sense to turn all options which may be enabled hwprobe to diagnostic instead? @robehn @RealFYang I guess you two have similar opinion now? but I could be wrong. Are you suggesting to turn all the Product options (retrieved by hwprobe) to DIAGNOSTIC? I won't suggest to turn any EXPERIMENTAL to DIAGNOSTIC in this pr, as we need to test it on real hardware first, but if you've tested some of them on real hardware please let me know, I'll do it in this pr. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21885#issuecomment-2456997208 From roland at openjdk.org Tue Nov 5 12:21:52 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 5 Nov 2024 12:21:52 GMT Subject: RFR: 8343068: C2: CastX2P Ideal transformation not always applied [v3] In-Reply-To: <5k9Zq9E-2OJg0raPGHS1t_45N7QguwAToBfZiCfPjlY=.f41fd65a-7ac1-4886-a801-16753b864a2e@github.com> References: <5k9Zq9E-2OJg0raPGHS1t_45N7QguwAToBfZiCfPjlY=.f41fd65a-7ac1-4886-a801-16753b864a2e@github.com> Message-ID: > The transformation: > > > (CastX2P (AddL base i)) -> (AddP (CastX2P base) i) > > > when i fits in an int is not always applied: when the type of `i` is > narrowed so it fits in an int, the `CastX2P` is not enqueued for > igvn. This can get in the way of vectorization as shown by test case > `test2`. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into JDK-8343068 - fix test - test & fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21714/files - new: https://git.openjdk.org/jdk/pull/21714/files/12a471f0..31b4cdde Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21714&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21714&range=01-02 Stats: 174646 lines in 1487 files changed: 23955 ins; 144716 del; 5975 mod Patch: https://git.openjdk.org/jdk/pull/21714.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21714/head:pull/21714 PR: https://git.openjdk.org/jdk/pull/21714 From roland at openjdk.org Tue Nov 5 12:22:15 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 5 Nov 2024 12:22:15 GMT Subject: RFR: 8341834: C2 compilation fails with "bad AD file" due to Replicate [v3] In-Reply-To: References: Message-ID: > Superword creates a `Replicate` node at a `ConvL2I` node and uses the > type of the result of the `ConvL2I` to pick the type of the > `Replicate` instead of the type of the input to the `ConvL2I`. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into JDK-8341834 - review - test - fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21660/files - new: https://git.openjdk.org/jdk/pull/21660/files/1070696f..9219a292 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21660&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21660&range=01-02 Stats: 174646 lines in 1487 files changed: 23955 ins; 144716 del; 5975 mod Patch: https://git.openjdk.org/jdk/pull/21660.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21660/head:pull/21660 PR: https://git.openjdk.org/jdk/pull/21660 From fyang at openjdk.org Tue Nov 5 12:30:29 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 5 Nov 2024 12:30:29 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option In-Reply-To: References: Message-ID: <5xhRjghPegFr5EkU8s63LH3uvkOkYvqczBi1the4k8U=.3ffe78a3-476e-429b-9ccc-d8b959296e43@github.com> On Tue, 5 Nov 2024 10:48:25 GMT, Robbin Ehn wrote: >> Hi, >> Can you help to review this simple patch? >> Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. >> Thanks > > Now we have normal and exprimental. Changing this one we would have also diagnostic. > And some we get from hwprobe and some not, it's not easy to know which ones to manually turn on, etc... > Wouldn't it make more sense to turn all options which may be enabled hwprobe to diagnostic instead? > @robehn @RealFYang I guess you two have similar opinion now? but I could be wrong. Are you suggesting to turn all the Product options (retrieved by hwprobe) to DIAGNOSTIC? > > I won't suggest to turn any EXPERIMENTAL to DIAGNOSTIC in this pr, as we need to test it on real hardware first, but if you've tested some of them on real hardware please let me know, I'll do it in this pr. My personal opinion is that we can make following ones DIAGNOSTIC as well as they have been tested on real hardwares. I agree with you to leave the other EXPERIMENTAL ones as they are. We can still turn them DIAGNOSTIC in the future when the hardware is available for testing. product(bool, UseRVC, false, "Use RVC instructions") \ product(bool, UseRVV, false, "Use RVV instructions") \ product(bool, UseZba, false, "Use Zba instructions") \ product(bool, UseZbb, false, "Use Zbb instructions") \ product(bool, UseZbs, false, "Use Zbs instructions") \ product(bool, UseZfh, false, "Use Zfh instructions") ------------- PR Comment: https://git.openjdk.org/jdk/pull/21885#issuecomment-2457040315 From rehn at openjdk.org Tue Nov 5 12:51:28 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 5 Nov 2024 12:51:28 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 18:33:00 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. > Thanks Ok, so the path is: 1: Exprimental (should they be turn on by hwprobe?) 2a: If hwprobe => Diagnostic 2b: No hwprobe => Normal Arguably hwprobe should only turn on diagnostic options then ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21885#issuecomment-2457083353 From mli at openjdk.org Tue Nov 5 13:03:29 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 5 Nov 2024 13:03:29 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 12:49:20 GMT, Robbin Ehn wrote: > Ok, so the path is: > 1: Exprimental (should they be turn on by hwprobe?) > 2a: If hwprobe => Diagnostic > 2b: No hwprobe => Normal > I agree. > should they be turn on by hwprobe? I suggest we keep it simple, i.e. keep it as it is now. > Arguably hwprobe should only turn on diagnostic options then ? Still think we'd better keep it simple. And users can still turn on/off by themselves if they want. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21885#issuecomment-2457108591 From duke at openjdk.org Tue Nov 5 13:17:08 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Tue, 5 Nov 2024 13:17:08 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines Message-ID: In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. Concerns were raised by @rwestrel in the previous PR: > When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. ------------- Commit messages: - Fix asset failures if printing is disabled - 8319850: PrintInlining should report late inlines - Revert "8319850: PrintInlining should report late inlines" - 8319850: PrintInlining should report late inlines Changes: https://git.openjdk.org/jdk/pull/21899/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319850 Stats: 22 lines in 2 files changed: 22 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21899.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21899/head:pull/21899 PR: https://git.openjdk.org/jdk/pull/21899 From duke at openjdk.org Tue Nov 5 13:18:42 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Tue, 5 Nov 2024 13:18:42 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method Message-ID: This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: ConINode* node = _igvn.intcon(i); set_ctrl(node, C->root()); and ConLNode* node = _igvn.longcon(i); set_ctrl(node, C->root()); Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. ------------- Commit messages: - Add helper methods for zerocon, makecon, and integercon too - 8343148: C2: Refactor uses of "PhaseValues::intcon() + PhaseIdealLoop::set_ctrl()" into separate method Changes: https://git.openjdk.org/jdk/pull/21836/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343148 Stats: 112 lines in 5 files changed: 40 ins; 36 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/21836.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21836/head:pull/21836 PR: https://git.openjdk.org/jdk/pull/21836 From duke at openjdk.org Tue Nov 5 13:19:03 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Tue, 5 Nov 2024 13:19:03 GMT Subject: RFR: 8323803: ConstantOopReadValue::print_on should print 'null' instead of 'nullptr' Message-ID: Printing incorrectly printed `nullptr` instead of `null` Buggy: ScopeDesc(pc=0x0000000104c05468 offset=2e8): java.lang.Class::desiredAssertionStatus at 20 (line 3984) Locals - l0: reg rfp [58],oop - l1: stack[0],oop - l2: nullptr - l3: empty Expression stack - @0: nullptr Fixed: ScopeDesc(pc=0x0000000106fdd468 offset=2e8): java.lang.Class::desiredAssertionStatus at 20 (line 3984) Locals - l0: reg rfp [58],oop - l1: stack[0],oop - l2: null - l3: empty Expression stack - @0: null ------------- Commit messages: - 8323803: ConstantOopReadValue::print_on should print 'null' instead of 'nullptr' Changes: https://git.openjdk.org/jdk/pull/21869/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21869&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323803 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21869.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21869/head:pull/21869 PR: https://git.openjdk.org/jdk/pull/21869 From chagedorn at openjdk.org Tue Nov 5 13:18:42 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 5 Nov 2024 13:18:42 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 16:04:38 GMT, theoweidmannoracle wrote: > This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: > > > ConINode* node = _igvn.intcon(i); > set_ctrl(node, C->root()); > > > and > > > ConLNode* node = _igvn.longcon(i); > set_ctrl(node, C->root()); > > > Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. While at it, we could extend this to the other constant creation methods `zerocon`, `makecon`, and `integercon` as well (`uncached_makecon` is only called by the other `*con*` methods - could be made `private` at some point). I suggest to update the RFE title accordingly since it only mentions `intcon` now. Maybe something like `PhaseValue::*con*() + ...`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21836#issuecomment-2453927465 From epeter at openjdk.org Tue Nov 5 13:41:51 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 5 Nov 2024 13:41:51 GMT Subject: RFR: 8333893: Optimization for StringBuilder append boolean & null [v20] In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 21:56:53 GMT, Shaojin Wen wrote: >> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into primitive arrays by combining values ??into larger stores. >> >> This PR rewrites the code of appendNull and append(boolean) methods so that these two methods can be optimized by C2. > > Shaojin Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 26 additional commits since the last revision: > > - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 > - fix build error > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - revert test > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - ... and 16 more: https://git.openjdk.org/jdk/compare/918f9b3e...457735c9 FYI https://github.com/openjdk/jdk/pull/19970 is now integrated - thanks for the patience :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2457208051 From swen at openjdk.org Tue Nov 5 15:08:45 2024 From: swen at openjdk.org (Shaojin Wen) Date: Tue, 5 Nov 2024 15:08:45 GMT Subject: RFR: 8333893: Optimization for StringBuilder append boolean & null [v20] In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 21:56:53 GMT, Shaojin Wen wrote: >> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into primitive arrays by combining values ??into larger stores. >> >> This PR rewrites the code of appendNull and append(boolean) methods so that these two methods can be optimized by C2. > > Shaojin Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 26 additional commits since the last revision: > > - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 > - fix build error > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - revert test > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - ... and 16 more: https://git.openjdk.org/jdk/compare/88a99fce...457735c9 It has been tested that mergeStore can work after the master branch is merged ------------- PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2457414686 From swen at openjdk.org Tue Nov 5 15:08:45 2024 From: swen at openjdk.org (Shaojin Wen) Date: Tue, 5 Nov 2024 15:08:45 GMT Subject: Integrated: 8333893: Optimization for StringBuilder append boolean & null In-Reply-To: References: Message-ID: On Mon, 10 Jun 2024 12:12:58 GMT, Shaojin Wen wrote: > After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into primitive arrays by combining values ??into larger stores. > > This PR rewrites the code of appendNull and append(boolean) methods so that these two methods can be optimized by C2. This pull request has now been integrated. Changeset: 5890d943 Author: Shaojin Wen URL: https://git.openjdk.org/jdk/commit/5890d9438bbde88b89070052926a2eafe13d7b42 Stats: 133 lines in 5 files changed: 79 ins; 18 del; 36 mod 8333893: Optimization for StringBuilder append boolean & null Reviewed-by: liach ------------- PR: https://git.openjdk.org/jdk/pull/19626 From chagedorn at openjdk.org Tue Nov 5 15:19:29 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 5 Nov 2024 15:19:29 GMT Subject: RFR: 8323803: ConstantOopReadValue::print_on should print 'null' instead of 'nullptr' In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 09:23:16 GMT, theoweidmannoracle wrote: > Printing incorrectly printed `nullptr` instead of `null` > > Buggy: > > > ScopeDesc(pc=0x0000000104c05468 offset=2e8): > java.lang.Class::desiredAssertionStatus at 20 (line 3984) > Locals > - l0: reg rfp [58],oop > - l1: stack[0],oop > - l2: nullptr > - l3: empty > Expression stack > - @0: nullptr > > > Fixed: > > > ScopeDesc(pc=0x0000000106fdd468 offset=2e8): > java.lang.Class::desiredAssertionStatus at 20 (line 3984) > Locals > - l0: reg rfp [58],oop > - l1: stack[0],oop > - l2: null > - l3: empty > Expression stack > - @0: null Looks good and trivial. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21869#pullrequestreview-2415938204 From kvn at openjdk.org Tue Nov 5 15:44:30 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 5 Nov 2024 15:44:30 GMT Subject: RFR: 8342943: Replace predicate walking and cloning code for main/post loops with a predicate visitor In-Reply-To: <5krLcDigRN94zhl5T8zX33E_7qyajZH96Xv-1biYGOc=.860f4223-44fa-4904-ae70-dafe74b2989c@github.com> References: <5krLcDigRN94zhl5T8zX33E_7qyajZH96Xv-1biYGOc=.860f4223-44fa-4904-ae70-dafe74b2989c@github.com> Message-ID: On Wed, 30 Oct 2024 15:18:56 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and post loop (this PR) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): Loop Unswitching and removing useless Assertion Predicates (upcoming) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (upcoming) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. This limitation should eventually be removed. But I want to do that separately at a later point. > --- > > #### Refactorings of this Patch > This patch replaces the predicate walking and cloning code for **main and post loops**. The code can reuse the code established w... Nice refactoring ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21790#pullrequestreview-2416007518 From swen at openjdk.org Tue Nov 5 15:45:04 2024 From: swen at openjdk.org (Shaojin Wen) Date: Tue, 5 Nov 2024 15:45:04 GMT Subject: RFR: 8343629: More MergeStore benchmark Message-ID: 1. Added the putBytes4 benchmark, which corresponds to StringBuilder appendNull 2. Optimized the putChars4/setInt/setLong series of benchmarks to reduce extra overhead and more accurately reflect performance differences. ------------- Commit messages: - Merge remote-tracking branch 'upstream/master' into merge_store_bench_202410 - Merge branch 'master' into merge_store_bench_202410 - add putBytes4 and improved put Changes: https://git.openjdk.org/jdk/pull/21659/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21659&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343629 Stats: 315 lines in 1 file changed: 71 ins; 51 del; 193 mod Patch: https://git.openjdk.org/jdk/pull/21659.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21659/head:pull/21659 PR: https://git.openjdk.org/jdk/pull/21659 From chagedorn at openjdk.org Tue Nov 5 15:47:41 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 5 Nov 2024 15:47:41 GMT Subject: RFR: 8342943: Replace predicate walking and cloning code for main/post loops with a predicate visitor In-Reply-To: <5krLcDigRN94zhl5T8zX33E_7qyajZH96Xv-1biYGOc=.860f4223-44fa-4904-ae70-dafe74b2989c@github.com> References: <5krLcDigRN94zhl5T8zX33E_7qyajZH96Xv-1biYGOc=.860f4223-44fa-4904-ae70-dafe74b2989c@github.com> Message-ID: On Wed, 30 Oct 2024 15:18:56 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and post loop (this PR) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): Loop Unswitching and removing useless Assertion Predicates (upcoming) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (upcoming) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. This limitation should eventually be removed. But I want to do that separately at a later point. > --- > > #### Refactorings of this Patch > This patch replaces the predicate walking and cloning code for **main and post loops**. The code can reuse the code established w... Thanks Vladimir! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21790#issuecomment-2457518979 From qamai at openjdk.org Tue Nov 5 15:52:38 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 5 Nov 2024 15:52:38 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v4] In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 03:36:12 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch adds a new pass to consolidate lowering of complex backend-specific code patterns, such as `MacroLogicV` and the optimization proposed by #21244. Moving these optimizations to backend code can simplify shared code, while also making it easier to develop more in-depth optimizations. The linked bug has an example of a new optimization this could enable. The new phase does GVN to de-duplicate nodes and calls nodes' `Value()` method, but it does not call `Identity()` or `Ideal()` to avoid undoing any changes done during lowering. It also reuses the IGVN worklist to avoid needing to re-create the notification mechanism. >> >> In this PR only the skeleton code for the pass is added, moving `MacroLogicV` to this system will be done separately in a future patch. Tier 1 tests pass on my linux x64 machine. Feedback on this patch would be greatly appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Re-use optimize() and add backend-specific should_lower() Thanks a lot, the patch looks good to me. ------------- Marked as reviewed by qamai (Committer). PR Review: https://git.openjdk.org/jdk/pull/21599#pullrequestreview-2416032305 From chagedorn at openjdk.org Tue Nov 5 15:55:36 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 5 Nov 2024 15:55:36 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 10:19:51 GMT, theoweidmannoracle wrote: > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. Looks good to me. Since you could take over the patch from @caojoshua, you should add him as a contributor with `/contributor add @caojoshua`. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21899#pullrequestreview-2416042408 From swen at openjdk.org Tue Nov 5 16:25:46 2024 From: swen at openjdk.org (Shaojin Wen) Date: Tue, 5 Nov 2024 16:25:46 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v17] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Tue, 5 Nov 2024 11:45:37 GMT, Emanuel Peter wrote: >>> /contributor add chhagedorn >>> >>> You spent enough time on this already ;) >> >> Thanks Emanuel, I highly appreciate that :-) > > Thanks @chhagedorn for the extensive reviews and collaboration on improving the proofs ? > Thanks @vnkozlov for the approval. > > I did an offline merge and testing (to avoid requiring a re-approval) - all looks good. @eme64 How do I use the TraceMergeStores option? It worked before, but now it gives an error. build/macosx-aarch64-server-fastdebug/jdk/bin/java -Dtest=appendNullLatin1 -XX:+TraceMergeStores output Unrecognized VM option 'TraceMergeStores' Did you mean '(+/-)MergeStores'? Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2457624721 From epeter at openjdk.org Tue Nov 5 16:47:43 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 5 Nov 2024 16:47:43 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v17] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Tue, 5 Nov 2024 16:22:35 GMT, Shaojin Wen wrote: >> Thanks @chhagedorn for the extensive reviews and collaboration on improving the proofs ? >> Thanks @vnkozlov for the approval. >> >> I did an offline merge and testing (to avoid requiring a re-approval) - all looks good. > > @eme64 How do I use the TraceMergeStores option? It worked before, but now it gives an error. > > > build/macosx-aarch64-server-fastdebug/jdk/bin/java -Dtest=appendNullLatin1 -XX:+TraceMergeStores > > > output > > Unrecognized VM option 'TraceMergeStores' > Did you mean '(+/-)MergeStores'? > Error: Could not create the Java Virtual Machine. > Error: A fatal exception has occurred. Program will exit. @wenshao Ah, good question! I changed it from a "global" flag to a "compile option". You can now filter the methods! And you can enable different tags - so you can regulate how verbose it is. Example: `-XX:CompileCommand=TraceMergeStores,Test::test*,SUCCESS,ADJACENCY,ALIASING,BASIC` And to see all available tags: `-XX:CompileCommand=TraceMergeStores,Test::test*,help` Usage for CompileCommand TraceMergeStores: -XX:CompileCommand=TraceMergeStores,, tags descriptions BASIC Trace basic analysis steps POINTER Trace pointer IR ALIASING Trace MemPointerSimpleForm::get_aliasing_with ADJACENCY Trace adjacency SUCCESS Trace successful merges You might have to play around a little to see what is helpful to you. And I'm always open to feedback :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2457676944 From rcastanedalo at openjdk.org Tue Nov 5 17:07:13 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 5 Nov 2024 17:07:13 GMT Subject: RFR: 8339303: C2: dead node after failing to match cloned address expression [v2] In-Reply-To: References: Message-ID: > This changeset prevents the x86 platform-specific logic from cloning address expressions consisting of two chained `AddP` nodes with a small constant offset each, such as in the following example: > > ![example](https://github.com/user-attachments/assets/86c143a1-3895-4e0c-936b-0d22b7c80e73) > > Such patterns cannot be fully subsumed into x86 complex addressing modes, and cloning them can cause the matcher to introduce dead nodes that trigger a segmentation fault in the subsequent global code motion phase. See a detailed analysis of the failure in the [JBS issue description](https://bugs.openjdk.org/browse/JDK-8339303). > > The changeset additionally extends the post-matching verification logic to check that no old node is reachable by travesing both node inputs and outputs. This extension would have caused the original test case to fail directly after matching with an informative assertion message rather than an opaque segmentation fault in an unrelated code generation phase. > > Note that the pattern causing the failure should be in general optimized by `AddPNode::Ideal` into a single `AddP` node with the constant sum of the offsets. While [JDK-8343067](https://bugs.openjdk.org/browse/JDK-8343067) should address the missing optimization, this changeset proposes a complementary solution that is easily backportable and avoids relying on specific optimizations for correctness. > > #### Testing > > ##### Functionality > > - tier1-5 (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > ##### Performance > > - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Use DUIterator_Fast to traverse node outputs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21829/files - new: https://git.openjdk.org/jdk/pull/21829/files/e85ba7cb..b0aa39fc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21829&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21829&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21829.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21829/head:pull/21829 PR: https://git.openjdk.org/jdk/pull/21829 From rcastanedalo at openjdk.org Tue Nov 5 17:07:13 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 5 Nov 2024 17:07:13 GMT Subject: RFR: 8339303: C2: dead node after failing to match cloned address expression [v2] In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 22:58:47 GMT, Vladimir Ivanov wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Use DUIterator_Fast to traverse node outputs > > src/hotspot/share/opto/matcher.cpp line 183: > >> 181: } >> 182: } >> 183: for (uint j = 0; j < n->outcnt(); j++) { > > Why don't you use DU iterator instead (e.g., `DUIterator_Fast`)? Right, done in commit b0aa39fc, thanks. Please re-review. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21829#discussion_r1829715261 From roland at openjdk.org Tue Nov 5 17:12:33 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 5 Nov 2024 17:12:33 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 10:19:51 GMT, theoweidmannoracle wrote: > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. What about regular calls that fail to inline initially because the compiler ran out of inlining budget but are inlined later on? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2457735470 From rcastanedalo at openjdk.org Tue Nov 5 17:21:44 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 5 Nov 2024 17:21:44 GMT Subject: RFR: 8343067: C2: revisit constant-offset AddP chains after successful input idealizations Message-ID: This changeset re-adds a constant-offset AddP node (`u`) to C2's IGVN worklist when its address is given by another AddP node (`use`) whose offset has changed. This makes it possible for `AddPNode::Ideal` to flatten the address computation in cases where the offset of the latter (`use->in(AddPNode::Offset)`) is found to be constant during IGVN: ![idealization](https://github.com/user-attachments/assets/6b632642-c037-457f-bd19-6b30f24e6ac6) The end result is the generation of fewer explicit address computation instructions. #### Testing ##### Functionality - tier1-5 (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64; release and debug mode). ##### Performance - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. ------------- Commit messages: - Re-add to worklist only if it is the offset that changes - Simplify test - Remove test condition - Generalize test for aarch64 - Merge better with surrounding code - Add tentative solution (guarded with UseNewCode) - Add test case Changes: https://git.openjdk.org/jdk/pull/21898/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21898&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343067 Stats: 73 lines in 3 files changed: 70 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21898.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21898/head:pull/21898 PR: https://git.openjdk.org/jdk/pull/21898 From kvn at openjdk.org Tue Nov 5 18:05:29 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 5 Nov 2024 18:05:29 GMT Subject: RFR: 8343067: C2: revisit constant-offset AddP chains after successful input idealizations In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 10:02:16 GMT, Roberto Casta?eda Lozano wrote: > This changeset re-adds a constant-offset AddP node (`u`) to C2's IGVN worklist when its address is given by another AddP node (`use`) whose offset has changed. This makes it possible for `AddPNode::Ideal` to flatten the address computation in cases where the offset of the latter (`use->in(AddPNode::Offset)`) is found to be constant during IGVN: > > ![idealization](https://github.com/user-attachments/assets/6b632642-c037-457f-bd19-6b30f24e6ac6) > > The end result is the generation of fewer explicit address computation instructions. > > #### Testing > > ##### Functionality > > - tier1-5 (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > ##### Performance > > - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. src/hotspot/share/opto/phaseX.cpp line 1647: > 1645: if (u->is_Mem()) { > 1646: worklist.push(u); > 1647: } else if (n == use->in(AddPNode::Offset) && `n == use->in(AddPNode::Offset)` result can be saved outside loop in local var. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21898#discussion_r1829796865 From kvn at openjdk.org Tue Nov 5 18:10:28 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 5 Nov 2024 18:10:28 GMT Subject: RFR: 8323803: ConstantOopReadValue::print_on should print 'null' instead of 'nullptr' In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 09:23:16 GMT, theoweidmannoracle wrote: > Printing incorrectly printed `nullptr` instead of `null` > > Buggy: > > > ScopeDesc(pc=0x0000000104c05468 offset=2e8): > java.lang.Class::desiredAssertionStatus at 20 (line 3984) > Locals > - l0: reg rfp [58],oop > - l1: stack[0],oop > - l2: nullptr > - l3: empty > Expression stack > - @0: nullptr > > > Fixed: > > > ScopeDesc(pc=0x0000000106fdd468 offset=2e8): > java.lang.Class::desiredAssertionStatus at 20 (line 3984) > Locals > - l0: reg rfp [58],oop > - l1: stack[0],oop > - l2: null > - l3: empty > Expression stack > - @0: null Trivial ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21869#pullrequestreview-2416366722 From duke at openjdk.org Tue Nov 5 18:19:28 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Tue, 5 Nov 2024 18:19:28 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines In-Reply-To: References: Message-ID: <1-GKNeVswpUac-XTPAW78e7qQhfW8JRDlaOyY5mLVX4=.36089295-410b-477c-b657-82dd8109cc40@github.com> On Tue, 5 Nov 2024 17:10:08 GMT, Roland Westrelin wrote: >> In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. >> >> Concerns were raised by @rwestrel in the previous PR: >> >>> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? >> >> As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). >> >> The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: >> >> >> 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant >> @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call >> >> >> Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. > >> As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > >> The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. > > What about regular calls that fail to inline initially because the compiler ran out of inlining budget but are inlined later on? @rwestrel I think in case the inlining budget has been exceeded (i.e. try_to_inline and subsequently ok_to_inline fail), there's only two code locations where we would create a late inlining code generator: [here](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L380) and [here](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L292). Both are calls to CallGenerator::for_late_inline_virtual() that create a LateInlineVirtualCallGenerator, which only performs strength reduction AFAIK. There might be something I'm missing, though, since I've only been working on the C2 compiler for three days ? So please feel free to point me other cases of late inlining and I will investigate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2457866376 From kvn at openjdk.org Tue Nov 5 18:24:29 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 5 Nov 2024 18:24:29 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 16:04:38 GMT, theoweidmannoracle wrote: > This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: > > > ConINode* node = _igvn.intcon(i); > set_ctrl(node, C->root()); > > > and > > > ConLNode* node = _igvn.longcon(i); > set_ctrl(node, C->root()); > > > Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. Do we have other places (not new constant node) where we set Root as control? May be we can add `set_root_as_ctrl(n)` method in `loop node.hpp` in such case. ------------- PR Review: https://git.openjdk.org/jdk/pull/21836#pullrequestreview-2416390797 From kvn at openjdk.org Tue Nov 5 18:24:29 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 5 Nov 2024 18:24:29 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 18:20:58 GMT, Vladimir Kozlov wrote: > Do we have other places (not new constant node) where we set Root as control? In loop opts. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21836#issuecomment-2457874998 From duke at openjdk.org Tue Nov 5 18:25:28 2024 From: duke at openjdk.org (Joshua Cao) Date: Tue, 5 Nov 2024 18:25:28 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines In-Reply-To: <1-GKNeVswpUac-XTPAW78e7qQhfW8JRDlaOyY5mLVX4=.36089295-410b-477c-b657-82dd8109cc40@github.com> References: <1-GKNeVswpUac-XTPAW78e7qQhfW8JRDlaOyY5mLVX4=.36089295-410b-477c-b657-82dd8109cc40@github.com> Message-ID: <3GO6-2ZBvqEpDdVBejUIw4d_wIKNqh7tJbNOkt4UBHM=.aa92ed3b-44f6-4767-a90a-d1f472f0a74b@github.com> On Tue, 5 Nov 2024 18:16:41 GMT, theoweidmannoracle wrote: >>> As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). >> >>> The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. >> >> What about regular calls that fail to inline initially because the compiler ran out of inlining budget but are inlined later on? > > @rwestrel I think in case the inlining budget has been exceeded (i.e. try_to_inline and subsequently ok_to_inline fail), there's only two code locations where we would create a late inlining code generator: [here](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L380) and [here](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L292). Both are calls to CallGenerator::for_late_inline_virtual() that create a LateInlineVirtualCallGenerator, which only performs strength reduction AFAIK. > > There might be something I'm missing, though, since I've only been working on the C2 compiler for three days ? So please feel free to point me other cases of late inlining and I will investigate. Thanks @theoweidmannoracle for continuing this work and investigating the `CallGenerator::for_late_inline_virtual()` stuff. Not a reviewer, but LGTM. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2457876403 From kvn at openjdk.org Tue Nov 5 18:28:33 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 5 Nov 2024 18:28:33 GMT Subject: RFR: 8343173: Remove ZGC-specific non-JVMCI test groups [v2] In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 16:13:53 GMT, Leonid Mesnik wrote: >> The JVMCI should be supported by all GCs and specific >> hotspot_compiler_all_gcs >> group is not needed anymore. >> >> There are few failures of JVMCI tests with ZGC happened, the bug >> https://bugs.openjdk.org/browse/JDK-8343233 >> is filed and corresponding tests are problemlisted. > > Leonid Mesnik has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - typo fixed > - Merge branch 'master' of https://github.com/openjdk/jdk into 8343173 > - 8343173: Remove ZGC-specific non-JVMCI test groups Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21774#pullrequestreview-2416399409 From rcastanedalo at openjdk.org Tue Nov 5 19:50:12 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 5 Nov 2024 19:50:12 GMT Subject: RFR: 8343067: C2: revisit constant-offset AddP chains after successful input idealizations [v2] In-Reply-To: References: Message-ID: > This changeset re-adds a constant-offset AddP node (`u`) to C2's IGVN worklist when its address is given by another AddP node (`use`) whose offset has changed. This makes it possible for `AddPNode::Ideal` to flatten the address computation in cases where the offset of the latter (`use->in(AddPNode::Offset)`) is found to be constant during IGVN: > > ![idealization](https://github.com/user-attachments/assets/6b632642-c037-457f-bd19-6b30f24e6ac6) > > The end result is the generation of fewer explicit address computation instructions. > > #### Testing > > ##### Functionality > > - tier1-5 (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > ##### Performance > > - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Hoist changed offset input check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21898/files - new: https://git.openjdk.org/jdk/pull/21898/files/6dcbb0c6..deb7c4e1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21898&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21898&range=00-01 Stats: 3 lines in 1 file changed: 1 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21898.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21898/head:pull/21898 PR: https://git.openjdk.org/jdk/pull/21898 From rcastanedalo at openjdk.org Tue Nov 5 19:50:12 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 5 Nov 2024 19:50:12 GMT Subject: RFR: 8343067: C2: revisit constant-offset AddP chains after successful input idealizations [v2] In-Reply-To: References: Message-ID: <8yo89AtWDHWMseurCUy1o_-is_JMbtrogj1ci1-FNbk=.63b0f9e5-2c0c-4e9b-9ecd-fe0944bc160b@github.com> On Tue, 5 Nov 2024 18:03:15 GMT, Vladimir Kozlov wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Hoist changed offset input check > > src/hotspot/share/opto/phaseX.cpp line 1647: > >> 1645: if (u->is_Mem()) { >> 1646: worklist.push(u); >> 1647: } else if (n == use->in(AddPNode::Offset) && > > `n == use->in(AddPNode::Offset)` result can be saved outside loop in local var. Thanks, done (commit deb7c4e1). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21898#discussion_r1829916712 From sparasa at openjdk.org Tue Nov 5 20:52:41 2024 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 5 Nov 2024 20:52:41 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support [v7] In-Reply-To: References: Message-ID: > The goal of this PR is to fix instruction encoding errors for some of the instructions which support the APX New Data Destination (NDD) and No Flags (NF) features (added in [JDK-8329035](https://bugs.openjdk.org/browse/JDK-8329035)) > > The correctness of the encoding is verified by comparing to the reference encoding by GCC using an automated tool (https://github.com/openjdk/jdk/pull/21795) Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: update opcodes for load based operations ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21770/files - new: https://git.openjdk.org/jdk/pull/21770/files/fcc782b2..bca87165 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21770&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21770&range=05-06 Stats: 32 lines in 1 file changed: 0 ins; 10 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/21770.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21770/head:pull/21770 PR: https://git.openjdk.org/jdk/pull/21770 From kvn at openjdk.org Tue Nov 5 20:55:28 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 5 Nov 2024 20:55:28 GMT Subject: RFR: 8343067: C2: revisit constant-offset AddP chains after successful input idealizations [v2] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 19:50:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset re-adds a constant-offset AddP node (`u`) to C2's IGVN worklist when its address is given by another AddP node (`use`) whose offset has changed. This makes it possible for `AddPNode::Ideal` to flatten the address computation in cases where the offset of the latter (`use->in(AddPNode::Offset)`) is found to be constant during IGVN: >> >> ![idealization](https://github.com/user-attachments/assets/6b632642-c037-457f-bd19-6b30f24e6ac6) >> >> The end result is the generation of fewer explicit address computation instructions. >> >> #### Testing >> >> ##### Functionality >> >> - tier1-5 (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64; release and debug mode). >> >> ##### Performance >> >> - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Hoist changed offset input check Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21898#pullrequestreview-2416668042 From lmesnik at openjdk.org Tue Nov 5 20:55:35 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 5 Nov 2024 20:55:35 GMT Subject: Integrated: 8343173: Remove ZGC-specific non-JVMCI test groups In-Reply-To: References: Message-ID: <-1bZpI933zmujmTibsiiOkDdxnlxnKEGVGAPlqfvYik=.a0981eca-c8da-466c-a209-b266afea8513@github.com> On Tue, 29 Oct 2024 22:01:08 GMT, Leonid Mesnik wrote: > The JVMCI should be supported by all GCs and specific > hotspot_compiler_all_gcs > group is not needed anymore. > > There are few failures of JVMCI tests with ZGC happened, the bug > https://bugs.openjdk.org/browse/JDK-8343233 > is filed and corresponding tests are problemlisted. This pull request has now been integrated. Changeset: 847cc5eb Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/847cc5ebac43b83746d8f238c5f9ecf2972a2796 Stats: 12 lines in 2 files changed: 8 ins; 4 del; 0 mod 8343173: Remove ZGC-specific non-JVMCI test groups Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/21774 From cslucas at openjdk.org Tue Nov 5 21:02:29 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 5 Nov 2024 21:02:29 GMT Subject: RFR: 8340454: C2 EA asserts with "previous reducible Phi is no longer reducible before SUT" [v3] In-Reply-To: References: Message-ID: <2CcKaBp-JbbQ78T0ruK9IQtGMkexY9eiGF5xIHQh33M=.7dc91766-57f1-47b1-88d7-2c133d80011a@github.com> On Tue, 5 Nov 2024 08:57:14 GMT, Tobias Hartmann wrote: >> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: >> >> Address PR feedback: include test execution options. > > All green. Thank you @TobiHartmann ------------- PR Comment: https://git.openjdk.org/jdk/pull/21778#issuecomment-2458138370 From duke at openjdk.org Tue Nov 5 21:02:30 2024 From: duke at openjdk.org (duke) Date: Tue, 5 Nov 2024 21:02:30 GMT Subject: RFR: 8340454: C2 EA asserts with "previous reducible Phi is no longer reducible before SUT" [v3] In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 21:53:45 GMT, Cesar Soares Lucas wrote: >> Please, consider this patch to fix an issue that happens when a Phi previously considered reducible become later irreducible. The overall situation that causes the problem is like so: >> >> - Consider that there are at least 2 scalar replaceable objects (Obj1 and Obj2; Obj2 is stored in a field of Obj1) when we start iterating the loop at escape.cpp:301 >> >> - In the first iteration of the loop the call chain starting with `adjust_scalar_replaceable_state` ends up calling `can_reduce_phi` and considering Phi1 as reducible. This Phi has only Obj1 as *SR* input. >> >> - In another iteration of the loop Obj2 is flagged as NSR. For instance, because we are storing Obj2 in an unknown position of an array. This will cause `found_nsr_alloc` to be set to `true`. >> >> After the loop finishes, the execution will go to `find_scalar_replaceable_allocs`. The code will process Obj1, because it's still scalar replaceable, but will find that this object is stored in a field of a - **now** - NSR object. Therefore, correctly, Obj1 will also be marked as NSR. When Obj1 is marked as NSR Phi1 becomes irreducible because it doesn't have any more scalar replaceable input. >> >> The solution I'm proposing is simply revisit the "reducibility" of the Phis when an object is marked as NSR. >> >> --------- >> >> ### Tests >> >> Win, Mac & Linux tier1-4 on x64 & Aarch64. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Address PR feedback: include test execution options. @JohnTortugo Your change (at version 2449e42c8a01f600633d637651e6d53ff69297bc) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21778#issuecomment-2458140525 From cslucas at openjdk.org Tue Nov 5 21:22:41 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 5 Nov 2024 21:22:41 GMT Subject: Integrated: 8340454: C2 EA asserts with "previous reducible Phi is no longer reducible before SUT" In-Reply-To: References: Message-ID: <6PmRW_j30ZJZqXw8w7LkgPrvMA2ID0E0Eyjt5F-H4KU=.2f555ae0-8272-4b7a-87c4-4115e6465f3e@github.com> On Wed, 30 Oct 2024 00:40:22 GMT, Cesar Soares Lucas wrote: > Please, consider this patch to fix an issue that happens when a Phi previously considered reducible become later irreducible. The overall situation that causes the problem is like so: > > - Consider that there are at least 2 scalar replaceable objects (Obj1 and Obj2; Obj2 is stored in a field of Obj1) when we start iterating the loop at escape.cpp:301 > > - In the first iteration of the loop the call chain starting with `adjust_scalar_replaceable_state` ends up calling `can_reduce_phi` and considering Phi1 as reducible. This Phi has only Obj1 as *SR* input. > > - In another iteration of the loop Obj2 is flagged as NSR. For instance, because we are storing Obj2 in an unknown position of an array. This will cause `found_nsr_alloc` to be set to `true`. > > After the loop finishes, the execution will go to `find_scalar_replaceable_allocs`. The code will process Obj1, because it's still scalar replaceable, but will find that this object is stored in a field of a - **now** - NSR object. Therefore, correctly, Obj1 will also be marked as NSR. When Obj1 is marked as NSR Phi1 becomes irreducible because it doesn't have any more scalar replaceable input. > > The solution I'm proposing is simply revisit the "reducibility" of the Phis when an object is marked as NSR. > > --------- > > ### Tests > > Win, Mac & Linux tier1-4 on x64 & Aarch64. This pull request has now been integrated. Changeset: d4d9831c Author: Cesar Soares Lucas URL: https://git.openjdk.org/jdk/commit/d4d9831c9075c1a157d8375e6902bfc6c731389a Stats: 124 lines in 3 files changed: 121 ins; 0 del; 3 mod 8340454: C2 EA asserts with "previous reducible Phi is no longer reducible before SUT" Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21778 From vlivanov at openjdk.org Tue Nov 5 21:39:29 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 5 Nov 2024 21:39:29 GMT Subject: RFR: 8343067: C2: revisit constant-offset AddP chains after successful input idealizations [v2] In-Reply-To: References: Message-ID: <1XHa78pQg4aMT2bD_vFY9dCP3h4XpfOtw3skiKBjx-g=.f0eb7973-3b49-4667-9b20-45a4ea5b9c2e@github.com> On Tue, 5 Nov 2024 19:50:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset re-adds a constant-offset AddP node (`u`) to C2's IGVN worklist when its address is given by another AddP node (`use`) whose offset has changed. This makes it possible for `AddPNode::Ideal` to flatten the address computation in cases where the offset of the latter (`use->in(AddPNode::Offset)`) is found to be constant during IGVN: >> >> ![idealization](https://github.com/user-attachments/assets/6b632642-c037-457f-bd19-6b30f24e6ac6) >> >> The end result is the generation of fewer explicit address computation instructions. >> >> #### Testing >> >> ##### Functionality >> >> - tier1-5 (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64; release and debug mode). >> >> ##### Performance >> >> - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Hoist changed offset input check Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21898#pullrequestreview-2416744542 From vlivanov at openjdk.org Tue Nov 5 21:43:31 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 5 Nov 2024 21:43:31 GMT Subject: RFR: 8339303: C2: dead node after failing to match cloned address expression [v2] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 17:07:13 GMT, Roberto Casta?eda Lozano wrote: >> This changeset prevents the x86 platform-specific logic from cloning address expressions consisting of two chained `AddP` nodes with a small constant offset each, such as in the following example: >> >> ![example](https://github.com/user-attachments/assets/86c143a1-3895-4e0c-936b-0d22b7c80e73) >> >> Such patterns cannot be fully subsumed into x86 complex addressing modes, and cloning them can cause the matcher to introduce dead nodes that trigger a segmentation fault in the subsequent global code motion phase. See a detailed analysis of the failure in the [JBS issue description](https://bugs.openjdk.org/browse/JDK-8339303). >> >> The changeset additionally extends the post-matching verification logic to check that no old node is reachable by travesing both node inputs and outputs. This extension would have caused the original test case to fail directly after matching with an informative assertion message rather than an opaque segmentation fault in an unrelated code generation phase. >> >> Note that the pattern causing the failure should be in general optimized by `AddPNode::Ideal` into a single `AddP` node with the constant sum of the offsets. While [JDK-8343067](https://bugs.openjdk.org/browse/JDK-8343067) should address the missing optimization, this changeset proposes a complementary solution that is easily backportable and avoids relying on specific optimizations for correctness. >> >> #### Testing >> >> ##### Functionality >> >> - tier1-5 (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64; release and debug mode). >> >> ##### Performance >> >> - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Use DUIterator_Fast to traverse node outputs Marked as reviewed by vlivanov (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21829#pullrequestreview-2416750596 From sviswanathan at openjdk.org Tue Nov 5 21:56:30 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 5 Nov 2024 21:56:30 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support [v7] In-Reply-To: References: Message-ID: <_7R5NelxVRX3Cze4kId-NsdQ17qSPoF2cavVJYyF5Qo=.2ddd34f2-4cb0-4c74-8c8b-8596d9c42c1b@github.com> On Tue, 5 Nov 2024 20:52:41 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to fix instruction encoding errors for some of the instructions which support the APX New Data Destination (NDD) and No Flags (NF) features (added in [JDK-8329035](https://bugs.openjdk.org/browse/JDK-8329035)) >> >> The correctness of the encoding is verified by comparing to the reference encoding by GCC using an automated tool (https://github.com/openjdk/jdk/pull/21795) > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > update opcodes for load based operations Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21770#pullrequestreview-2416768268 From swen at openjdk.org Tue Nov 5 23:41:44 2024 From: swen at openjdk.org (Shaojin Wen) Date: Tue, 5 Nov 2024 23:41:44 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v18] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Mon, 4 Nov 2024 11:48:49 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > more changes for Christian Currently, TraceMergeStores can only be used in fastdebug images. Are you planning to support it in release images? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2458416345 From swen at openjdk.org Wed Nov 6 00:31:28 2024 From: swen at openjdk.org (Shaojin Wen) Date: Wed, 6 Nov 2024 00:31:28 GMT Subject: RFR: 8343629: More MergeStore benchmark In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 07:03:33 GMT, Shaojin Wen wrote: > 1. Added the putBytes4 benchmark, which corresponds to StringBuilder appendNull > 2. Optimized the putChars4/setInt/setLong series of benchmarks to reduce extra overhead and more accurately reflect performance differences. @eme64 Below are the performance numbers running under AMD EPYC? Genoa (x64), where the scenario of putBytes4GetBytes is "null".getBytes(0, 4, bytes4, off); Is it possible to do MergeStore in this scenario? Benchmark Mode Cnt Score Error Units MergeStoreBench.getCharB avgt 5 6038.532 ? 533.982 ns/op MergeStoreBench.getCharBU avgt 5 4923.182 ? 163.872 ns/op MergeStoreBench.getCharBV avgt 5 3111.268 ? 84.077 ns/op MergeStoreBench.getCharC avgt 5 2245.270 ? 33.559 ns/op MergeStoreBench.getCharL avgt 5 6109.519 ? 249.512 ns/op MergeStoreBench.getCharLU avgt 5 4552.425 ? 161.933 ns/op MergeStoreBench.getCharLV avgt 5 2239.866 ? 91.853 ns/op MergeStoreBench.getIntB avgt 5 8163.035 ? 137.565 ns/op MergeStoreBench.getIntBU avgt 5 9136.199 ? 259.491 ns/op MergeStoreBench.getIntBV avgt 5 314.123 ? 4.510 ns/op MergeStoreBench.getIntL avgt 5 7879.011 ? 10.759 ns/op MergeStoreBench.getIntLU avgt 5 8968.715 ? 268.414 ns/op MergeStoreBench.getIntLV avgt 5 2228.228 ? 1.510 ns/op MergeStoreBench.getIntRB avgt 5 8618.141 ? 22.545 ns/op MergeStoreBench.getIntRBU avgt 5 11239.977 ? 447.754 ns/op MergeStoreBench.getIntRL avgt 5 9060.754 ? 236.147 ns/op MergeStoreBench.getIntRLU avgt 5 9365.050 ? 154.357 ns/op MergeStoreBench.getIntRU avgt 5 2540.704 ? 75.198 ns/op MergeStoreBench.getIntU avgt 5 2508.954 ? 74.999 ns/op MergeStoreBench.getLongB avgt 5 24940.668 ? 16857.311 ns/op MergeStoreBench.getLongBU avgt 5 14126.468 ? 329.241 ns/op MergeStoreBench.getLongBV avgt 5 607.128 ? 23.775 ns/op MergeStoreBench.getLongL avgt 5 25519.679 ? 15393.727 ns/op MergeStoreBench.getLongLU avgt 5 14598.271 ? 481.158 ns/op MergeStoreBench.getLongLV avgt 5 2227.659 ? 16.334 ns/op MergeStoreBench.getLongRB avgt 5 25158.839 ? 18209.451 ns/op MergeStoreBench.getLongRBU avgt 5 14005.082 ? 208.154 ns/op MergeStoreBench.getLongRL avgt 5 25303.319 ? 14775.524 ns/op MergeStoreBench.getLongRLU avgt 5 14481.847 ? 309.623 ns/op MergeStoreBench.getLongRU avgt 5 3065.744 ? 15.405 ns/op MergeStoreBench.getLongU avgt 5 3048.522 ? 0.704 ns/op MergeStoreBench.putBytes4 avgt 5 933.283 ? 6.197 ns/op MergeStoreBench.putBytes4GetBytes avgt 5 5917.932 ? 199.901 ns/op MergeStoreBench.putBytes4U avgt 5 944.097 ? 25.902 ns/op MergeStoreBench.putBytes4X avgt 5 944.714 ? 18.924 ns/op MergeStoreBench.putChars4B avgt 5 5679.262 ? 154.030 ns/op MergeStoreBench.putChars4BU avgt 5 1143.133 ? 4.250 ns/op MergeStoreBench.putChars4BV avgt 5 4530.941 ? 124.318 ns/op MergeStoreBench.putChars4C avgt 5 1138.541 ? 27.843 ns/op MergeStoreBench.putChars4L avgt 5 5647.885 ? 112.363 ns/op MergeStoreBench.putChars4LU avgt 5 1142.501 ? 4.400 ns/op MergeStoreBench.putChars4LV avgt 5 1143.770 ? 3.435 ns/op MergeStoreBench.putChars4S avgt 5 1141.919 ? 36.528 ns/op MergeStoreBench.setCharBS avgt 5 6114.143 ? 144.826 ns/op MergeStoreBench.setCharBV avgt 5 3607.599 ? 87.720 ns/op MergeStoreBench.setCharC avgt 5 4510.196 ? 5.445 ns/op MergeStoreBench.setCharLS avgt 5 5641.424 ? 195.167 ns/op MergeStoreBench.setCharLV avgt 5 2267.712 ? 40.752 ns/op MergeStoreBench.setIntB avgt 5 8049.368 ? 233.618 ns/op MergeStoreBench.setIntBU avgt 5 18052.279 ? 2428.567 ns/op MergeStoreBench.setIntBV avgt 5 3287.905 ? 63.375 ns/op MergeStoreBench.setIntL avgt 5 2135.887 ? 62.601 ns/op MergeStoreBench.setIntLU avgt 5 4795.636 ? 74.974 ns/op MergeStoreBench.setIntLV avgt 5 2154.363 ? 81.324 ns/op MergeStoreBench.setIntRB avgt 5 13895.941 ? 7981.782 ns/op MergeStoreBench.setIntRBU avgt 5 14756.267 ? 1585.571 ns/op MergeStoreBench.setIntRL avgt 5 3284.792 ? 37.939 ns/op MergeStoreBench.setIntRLU avgt 5 5958.555 ? 27.404 ns/op MergeStoreBench.setIntRU avgt 5 5983.119 ? 79.627 ns/op MergeStoreBench.setIntU avgt 5 4848.655 ? 168.466 ns/op MergeStoreBench.setLongB avgt 5 31871.401 ? 1233.822 ns/op MergeStoreBench.setLongBU avgt 5 25704.975 ? 5105.792 ns/op MergeStoreBench.setLongBV avgt 5 2199.367 ? 69.511 ns/op MergeStoreBench.setLongL avgt 5 5486.926 ? 30.874 ns/op MergeStoreBench.setLongLU avgt 5 4503.212 ? 81.635 ns/op MergeStoreBench.setLongLV avgt 5 2144.943 ? 38.944 ns/op MergeStoreBench.setLongRB avgt 5 30338.353 ? 1631.512 ns/op MergeStoreBench.setLongRBU avgt 5 25025.442 ? 2690.138 ns/op MergeStoreBench.setLongRL avgt 5 4553.245 ? 128.721 ns/op MergeStoreBench.setLongRLU avgt 5 4793.427 ? 1.474 ns/op MergeStoreBench.setLongRU avgt 5 4803.963 ? 74.017 ns/op MergeStoreBench.setLongU avgt 5 4564.326 ? 146.283 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/21659#issuecomment-2458465745 From dlong at openjdk.org Wed Nov 6 00:59:28 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 6 Nov 2024 00:59:28 GMT Subject: RFR: 8343506: [s390x] multiple test failures with ubsan In-Reply-To: References: <-_6fxDHjNO3eG0JX_Nqscc0bVUQgLcmK5OjoVME7xNk=.6ae3a749-36a4-47d2-9ff1-a29c28f97dd8@github.com> Message-ID: On Tue, 5 Nov 2024 10:05:02 GMT, Martin Doerr wrote: >> src/hotspot/cpu/s390/s390.ad line 2550: >> >>> 2548: // Unsigned Integer Immediate: 9-bit >>> 2549: operand SSlenDW() %{ >>> 2550: predicate(Immediate::is_uimm8((julong)n->get_long()-1)); >> >> Suggestion: >> >> predicate(n->get_long() >= 1 && Immediate::is_uimm8((julong)n->get_long()-1)); > > I don't think this is necessary. Unsigned subtraction with wrap-around is not undefined behavior. Right, it's not UB, but sometimes it is a bug, and would be flagged by things like -fsanitize=unsigned-integer-overflow, so my preference would be to avoid it if possible. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21864#discussion_r1830259148 From fyang at openjdk.org Wed Nov 6 03:24:28 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 6 Nov 2024 03:24:28 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 18:33:00 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. > Thanks We don't auto-enable Exprimental options through hwprobe until they are fully tested on real hardwares. That's what we do for now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21885#issuecomment-2458651208 From chagedorn at openjdk.org Wed Nov 6 06:12:35 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 6 Nov 2024 06:12:35 GMT Subject: Integrated: 8342943: Replace predicate walking and cloning code for main/post loops with a predicate visitor In-Reply-To: <5krLcDigRN94zhl5T8zX33E_7qyajZH96Xv-1biYGOc=.860f4223-44fa-4904-ae70-dafe74b2989c@github.com> References: <5krLcDigRN94zhl5T8zX33E_7qyajZH96Xv-1biYGOc=.860f4223-44fa-4904-ae70-dafe74b2989c@github.com> Message-ID: <-4QGg9Ue9Sk2hwx6V07buYycJvFNcRBcY4tU9VI8dYg=.0141577d-1bec-42a6-bae0-1b802927dcb5@github.com> On Wed, 30 Oct 2024 15:18:56 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and post loop (this PR) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): Loop Unswitching and removing useless Assertion Predicates (upcoming) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (upcoming) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. This limitation should eventually be removed. But I want to do that separately at a later point. > --- > > #### Refactorings of this Patch > This patch replaces the predicate walking and cloning code for **main and post loops**. The code can reuse the code established w... This pull request has now been integrated. Changeset: 4431852a Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/4431852a880b06241231d346311170331c20ab2d Stats: 275 lines in 5 files changed: 94 ins; 164 del; 17 mod 8342943: Replace predicate walking and cloning code for main/post loops with a predicate visitor Reviewed-by: roland, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21790 From jbhateja at openjdk.org Wed Nov 6 06:36:30 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 6 Nov 2024 06:36:30 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support [v7] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 20:52:41 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to fix instruction encoding errors for some of the instructions which support the APX New Data Destination (NDD) and No Flags (NF) features (added in [JDK-8329035](https://bugs.openjdk.org/browse/JDK-8329035)) >> >> The correctness of the encoding is verified by comparing to the reference encoding by GCC using an automated tool (https://github.com/openjdk/jdk/pull/21795) > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > update opcodes for load based operations Marked as reviewed by jbhateja (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21770#pullrequestreview-2417359277 From jbhateja at openjdk.org Wed Nov 6 06:36:31 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 6 Nov 2024 06:36:31 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support [v7] In-Reply-To: <9lYvrMFQKW4Cq9zKFDjkNg9zKgLvdaCtVQTZgUGgpNs=.78ee059a-5f85-4d6e-828a-a71351d44763@github.com> References: <9lYvrMFQKW4Cq9zKFDjkNg9zKgLvdaCtVQTZgUGgpNs=.78ee059a-5f85-4d6e-828a-a71351d44763@github.com> Message-ID: On Wed, 30 Oct 2024 09:05:50 GMT, Jatin Bhateja wrote: > Hi @vamsi-parasa, NDD is very flexible in terms of argument selection, i.e. ADDL NDD, SRC1 (ModRM.R/M), SRC2 (ModRM.REG) has opcode 0x01 Whereas, ADDL NDD, SRC1 (ModRM.REG), SRC2 (ModRM.R/M) has opcode 0x03 > > In this case, we are trying to match GCC encoding scheme. > > Can you please add the following comment here since the argument nomenclature does not match with parameter nomenclature? > > NDD shares its encoding bits with NDS bits for regular EVEX instruction. Therefore we are passing DST as the second argument to minimize changes in leaf level routine. Thanks for addressing this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21770#discussion_r1830459760 From chagedorn at openjdk.org Wed Nov 6 07:06:01 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 6 Nov 2024 07:06:01 GMT Subject: RFR: 8342945: Replace predicate walking code in get_assertion_predicates() used for Loop Unswitching and cleaning useless Template Assertion Predicates with a predicate visitor Message-ID: <5rOjn2UG-Qg9QFlM4wjWWFZ2X7TicU25nS4VSI1NAu8=.f73abba2-8e1e-4494-80e0-5e092763ecec@github.com> #### Replacing the Remaining Predicate Walking and Cloning Code The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and Post Loop (integrated with https://github.com/openjdk/jdk/pull/21790) - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): `PhaseIdealLoop::get_assertion_predicates()` used for Loop Unswitching and removing useless Template Assertion Predicate (this PR) - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (upcoming) --- (Sections taken over from https://github.com/openjdk/jdk/pull/21679 / https://github.com/openjdk/jdk/pull/21790)) #### Single Template Assertion Predicate Check This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). #### Common Refactorings for all the Patches in this Series In each of the patch, I will do similar refactoring ideas: - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. This limitation should eventually be removed. But I want to do that separately at a later point. --- #### Refactorings of this Patch - This patch replaces the predicate walking in `PhaseIdealLoop::get_assertion_predicates()` which is used for Loop Unswitching and removing useless Template Assertion Predicates (called from `PhaseIdealLoop::collect_useful_template_assertion_predicates_for_loop()`). - Note that the cloning code in Loop Unswitching is not replaced, yet, because we clone the Template Assertion Predicates in the original order as currently found in the graph which also allowed us to use `PhaseIdealLoop::create_new_if_for_predicate()`. This means that we first walk from the loop entry to the last Template Assertion Predicate and then start cloning them in the reverse order (which ensures that we keep the original order of the Template Assertion Predicates). I don't think that keeping the original order is a strong requirement. Once we replace the UCTs with halt nodes, we do not require to call `create_new_if_for_predicate()` anymore and could theoretically just clone and initialize the Template Assertion Predicates in the opposite order as originally found in the graph which is easier to implement. This is currently also done for the other loop opts that require Assertion Predicates cloning/initialization. I think it's probably safe to do this for Loop Unswitching as well once we replace UCTs with halt nodes (@rwestrel what do you think?). If at some point, we need to keep the Assertion Predicate order, we can just add this functionality to the `PredicateIterator` classes. Anyhow, I'm leaving this code in`clone_assertion_predicates_to_unswitched_loop()` as it is for now and revisit it later again. Thanks, Christian ------------- Commit messages: - 8342945: Replace predicate walking code in get_assertion_predicates() used for Loop Unswitching and cleaning useless Template Assertion Predicates with a predicate visitor Changes: https://git.openjdk.org/jdk/pull/21918/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21918&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342945 Stats: 59 lines in 4 files changed: 22 ins; 22 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/21918.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21918/head:pull/21918 PR: https://git.openjdk.org/jdk/pull/21918 From epeter at openjdk.org Wed Nov 6 07:25:29 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 6 Nov 2024 07:25:29 GMT Subject: RFR: 8343629: More MergeStore benchmark In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 07:03:33 GMT, Shaojin Wen wrote: > 1. Added the putBytes4 benchmark, which corresponds to StringBuilder appendNull > 2. Optimized the putChars4/setInt/setLong series of benchmarks to reduce extra overhead and more accurately reflect performance differences. > ```java > "null".getBytes(0, 4, bytes4, off); > ``` > > Is it possible to do MergeStore in this scenario? I don't know. What do the logs say? And what does it currently compile down to, i.e. what assembly instructions? Otherwise I think this update seems reasonable. It would be nice if you could do some summary / explanation: which cases do still not optimize, and why? For that, it would be helpful if you had a run with, and one without `MergeStores` enabled - then we can easily compare the performance! You can find an example of how to do that easily here: https://github.com/openjdk/jdk/pull/19970/files#diff-9072c369f5b541ef9fca3ad8320aa59e88cc72f203c03da58100b1d111ffc324R746-R749 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21659#issuecomment-2458877595 PR Comment: https://git.openjdk.org/jdk/pull/21659#issuecomment-2458878368 PR Comment: https://git.openjdk.org/jdk/pull/21659#issuecomment-2458880901 PR Comment: https://git.openjdk.org/jdk/pull/21659#issuecomment-2458883551 From epeter at openjdk.org Wed Nov 6 07:32:45 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 6 Nov 2024 07:32:45 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v18] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Tue, 5 Nov 2024 23:38:59 GMT, Shaojin Wen wrote: > Currently, TraceMergeStores can only be used in fastdebug images. Are you planning to support it in release images? I generally don't support it in release builds, only debug - or rather `NOT_PRODUCT`. The issue with supporting in release is that some other printing methods I use are not available in product (`Node::dump`). And if we support it in release, then we have to create a CSR, clearly specify what it prints, and then we are going to be less flexible in the future with changing the behavior. So I would rather not make it product, at least for now ;) BTW: this is also why you can only disable `-XX:-MergeStores` with `-XX:+UnlockDiagnosticVMOptions `: it is not a full product flag, and so does not require a CSR, and we are able to remove it or change its behavior. But if someone really has an issue with the MergeStores optimization, they at least have a workaround until we are able to fix it ;) Does that make sense? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2458892679 From thartmann at openjdk.org Wed Nov 6 08:09:32 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 6 Nov 2024 08:09:32 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 13:31:37 GMT, Damon Fenacci wrote: > # Issue > > The `compiler/vectorapi/VectorLogicalOpIdentityTest.java` has been failing because C2 compiling the test `testAndMaskSameValue1` expects to have 1 `AndV` nodes but it has none. > > # Cause > > The issue has to do with the criteria that trigger a cleanup when performing late inlining. In the failing test, when the compiler tries to inline a `jdk.internal.vm.vector.VectorSupport::binaryOp` call, it fails because its argument is of the wrong type, mainly because some cast nodes ?hide? the more ?precise? type. > The graph that leads to the issue looks like this: > ![1BCE8148-1E44-4CA1-AF8F-EFC6210FA740](https://github.com/user-attachments/assets/62dd917f-2dac-42a9-90cf-73eedcd3cf8a) > The compiler tries to inline `jdk.internal.vm.vector.VectorSupport::load` and it succeeds: > ![752E81C9-A37D-4626-81A9-E4A839FADD3D](https://github.com/user-attachments/assets/e61057b2-3093-4992-ba5a-b80e4000c0ec) > The node `3027 VectorBox` has type `IntMaxVector`. `912 CastPP` and `934 CheckCastPP` have type `IntVector`instead. > The compiler then tries to inline one of the 2 `bynaryOp` calls but it fails because it needs an argument of type `IntMaxVector` and the argument it is given, which is node `934 CheckCastPP` , has type `IntVector`. > > This would not happen if between the 2 inlining attempts a _cleanup_ was triggered. IGVN would run and the 2 nodes `912 CastPP` and `934 CheckCastPP` would be folded away. `binaryOp` could then be inlined since the types would match. > > # Solution > > In order to fix this an extra cleanup has to be performed when we encounter a situation like the one above, i.e. when late inlining creates a `VectorBox`. > > Additional test runs with `-XX:-TieredCompilation` are added to `VectorLogicalOpIdentityTest.java` and `VectorGatherMaskFoldingTest.java` as regression tests and `-XX:+IncrementalInlineForceCleanup` is removed from `VectorGatherMaskFoldingTest.java` (previously added as workaround for this issue) src/hotspot/share/opto/callGenerator.cpp line 734: > 732: } > 733: C->set_inlining_progress(true); > 734: C->set_do_cleanup(kit.stopped() || result->Opcode() == Op_VectorBox); // path is dead or vector box; needs cleanup This only triggers if the return value of the incrementally inlined method is a `VectorBox`, right? Is that sufficient? Could the `VectorBox` be hidden by another node? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21682#discussion_r1830549530 From thartmann at openjdk.org Wed Nov 6 08:14:29 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 6 Nov 2024 08:14:29 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines In-Reply-To: <1-GKNeVswpUac-XTPAW78e7qQhfW8JRDlaOyY5mLVX4=.36089295-410b-477c-b657-82dd8109cc40@github.com> References: <1-GKNeVswpUac-XTPAW78e7qQhfW8JRDlaOyY5mLVX4=.36089295-410b-477c-b657-82dd8109cc40@github.com> Message-ID: On Tue, 5 Nov 2024 18:16:41 GMT, theoweidmannoracle wrote: >>> As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). >> >>> The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. >> >> What about regular calls that fail to inline initially because the compiler ran out of inlining budget but are inlined later on? > > @rwestrel I think in case the inlining budget has been exceeded (i.e. try_to_inline and subsequently ok_to_inline fail), there's only two code locations where we would create a late inlining code generator: [here](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L380) and [here](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L292). Both are calls to CallGenerator::for_late_inline_virtual() that create a LateInlineVirtualCallGenerator, which only performs strength reduction AFAIK. > > There might be something I'm missing, though, since I've only been working on the C2 compiler for three days ? So please feel free to point me other cases of late inlining and I will investigate. > @theoweidmannoracle @caojoshua was not found in the census. @caojoshua You might want to [associate your GitHub account and your OpenJDK username](https://wiki.openjdk.org/display/SKARA/Skara#Skara-AssociatingyourGitHubaccountandyourOpenJDKusername). @theoweidmannoracle you can add Joshua via `/contributor add jcao` ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2458958121 From thartmann at openjdk.org Wed Nov 6 08:17:31 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 6 Nov 2024 08:17:31 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 10:19:51 GMT, theoweidmannoracle wrote: > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. Please update the copyright dates. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2458962968 From thartmann at openjdk.org Wed Nov 6 08:29:28 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 6 Nov 2024 08:29:28 GMT Subject: RFR: 8342945: Replace predicate walking code in get_assertion_predicates() used for Loop Unswitching and cleaning useless Template Assertion Predicates with a predicate visitor In-Reply-To: <5rOjn2UG-Qg9QFlM4wjWWFZ2X7TicU25nS4VSI1NAu8=.f73abba2-8e1e-4494-80e0-5e092763ecec@github.com> References: <5rOjn2UG-Qg9QFlM4wjWWFZ2X7TicU25nS4VSI1NAu8=.f73abba2-8e1e-4494-80e0-5e092763ecec@github.com> Message-ID: On Wed, 6 Nov 2024 07:00:57 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and Post Loop (integrated with https://github.com/openjdk/jdk/pull/21790) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): `PhaseIdealLoop::get_assertion_predicates()` used for Loop Unswitching and removing useless Template Assertion Predicate (this PR) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (upcoming) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679 / https://github.com/openjdk/jdk/pull/21790)) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. This limitation should eventually be removed. But I want to do that separately at a later point. > ---... Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21918#pullrequestreview-2417539451 From dfenacci at openjdk.org Wed Nov 6 08:38:29 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 6 Nov 2024 08:38:29 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure In-Reply-To: References: Message-ID: <-Rus6nFTc2zKhUmxgsFbK1H6Q1yr9OKHKTcdej-I0jw=.193754f3-c396-42c0-86b5-73af1e11bd8b@github.com> On Mon, 4 Nov 2024 21:03:37 GMT, Dean Long wrote: > Would it be better to trigger cleanup based on the presence of nodes like CastPP/CheckCastPP instead? Good point. At first I wanted to restrict the extra cleanups as much as possible (checking for a VectorBox seemed more restrictive) but the "origin" of the issue are actually the CastPP/CheckCastPP nodes. I just want to check how "expensive" that is. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21682#issuecomment-2459002194 From roland at openjdk.org Wed Nov 6 08:42:28 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 6 Nov 2024 08:42:28 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 10:19:51 GMT, theoweidmannoracle wrote: > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. With a simple test case: public static void main(String[] args) { for (int i = 0; i < 20_000; i++) { test1(); } } private static void test1() { inlined1(); } private static void inlined1() { } } Without your patch: $ java -XX:-TieredCompilation -XX:-UseOnStackReplacement -XX:-BackgroundCompilation -XX:+PrintCompilation -XX:CompileOnly=TestLateInlining::test1 -XX:CompileCommand=quiet -XX:+PrintInlining -XX:+AlwaysIncrementalInline TestLateInlining 87 1 n jdk.internal.vm.Continuation::enterSpecial (native) (static) 87 2 n jdk.internal.vm.Continuation::doYield (native) (static) 92 3 b TestLateInlining::test1 (4 bytes) @ 0 TestLateInlining::inlined1 (1 bytes) inline (hot) With your patch: $ java -XX:-TieredCompilation -XX:-UseOnStackReplacement -XX:-BackgroundCompilation -XX:+PrintCompilation -XX:CompileOnly=TestLateInlining::test1 -XX:CompileCommand=quiet -XX:+PrintInlining -XX:+AlwaysIncrementalInline TestLateInlining 86 1 n jdk.internal.vm.Continuation::enterSpecial (native) (static) 86 2 n jdk.internal.vm.Continuation::doYield (native) (static) 92 3 b TestLateInlining::test1 (4 bytes) @ 0 TestLateInlining::inlined1 (1 bytes) late inline I think it would be nice to preserve the "inline (hot)" part of the first input as it's the reason for inlining. There can be other reason for inlining (not many from a quick look at the code) but, who knows, there could be more in the future. Also, having a test case would be useful. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2459009598 From shade at openjdk.org Wed Nov 6 09:18:02 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 6 Nov 2024 09:18:02 GMT Subject: RFR: 8321997: Increase upper limit of LoopOptsCount flag In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 09:13:12 GMT, Tobias Holenstein wrote: > Currently `LoopOptsCount` has a range of 5-43 with default value 43. For stress testing we want to set values higher than 43. Set to upper limit to 1000 or even max_jint. Looks fine. It is fairly cryptic why `43` is the default :) ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21921#pullrequestreview-2417645031 From tholenstein at openjdk.org Wed Nov 6 09:18:02 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 6 Nov 2024 09:18:02 GMT Subject: RFR: 8321997: Increase upper limit of LoopOptsCount flag Message-ID: Currently `LoopOptsCount` has a range of 5-43 with default value 43. For stress testing we want to set values higher than 43. Set to upper limit to 1000 or even max_jint. ------------- Commit messages: - JDK-8331727: Increase upper limit of LoopOptsCount flag Changes: https://git.openjdk.org/jdk/pull/21921/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21921&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8321997 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21921.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21921/head:pull/21921 PR: https://git.openjdk.org/jdk/pull/21921 From mli at openjdk.org Wed Nov 6 09:18:32 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 6 Nov 2024 09:18:32 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 18:33:00 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. > Thanks Hey, so what's the conclusion for now? I'm fine with @RealFYang 's proposal above, how do you think? @robehn product(bool, UseRVC, false, "Use RVC instructions") \ product(bool, UseRVV, false, "Use RVV instructions") \ product(bool, UseZba, false, "Use Zba instructions") \ product(bool, UseZbb, false, "Use Zbb instructions") \ product(bool, UseZbs, false, "Use Zbs instructions") \ product(bool, UseZfh, false, "Use Zfh instructions") ------------- PR Comment: https://git.openjdk.org/jdk/pull/21885#issuecomment-2459086263 From rcastanedalo at openjdk.org Wed Nov 6 09:20:38 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 6 Nov 2024 09:20:38 GMT Subject: RFR: 8339303: C2: dead node after failing to match cloned address expression [v2] In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 16:39:49 GMT, Vladimir Kozlov wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Use DUIterator_Fast to traverse node outputs > > Looks good. Yes, it looks like code expect LShift here instead of constant. Thanks for reviewing, @vnkozlov and @iwanowww! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21829#issuecomment-2459086820 From rcastanedalo at openjdk.org Wed Nov 6 09:20:39 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 6 Nov 2024 09:20:39 GMT Subject: Integrated: 8339303: C2: dead node after failing to match cloned address expression In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 13:53:33 GMT, Roberto Casta?eda Lozano wrote: > This changeset prevents the x86 platform-specific logic from cloning address expressions consisting of two chained `AddP` nodes with a small constant offset each, such as in the following example: > > ![example](https://github.com/user-attachments/assets/86c143a1-3895-4e0c-936b-0d22b7c80e73) > > Such patterns cannot be fully subsumed into x86 complex addressing modes, and cloning them can cause the matcher to introduce dead nodes that trigger a segmentation fault in the subsequent global code motion phase. See a detailed analysis of the failure in the [JBS issue description](https://bugs.openjdk.org/browse/JDK-8339303). > > The changeset additionally extends the post-matching verification logic to check that no old node is reachable by travesing both node inputs and outputs. This extension would have caused the original test case to fail directly after matching with an informative assertion message rather than an opaque segmentation fault in an unrelated code generation phase. > > Note that the pattern causing the failure should be in general optimized by `AddPNode::Ideal` into a single `AddP` node with the constant sum of the offsets. While [JDK-8343067](https://bugs.openjdk.org/browse/JDK-8343067) should address the missing optimization, this changeset proposes a complementary solution that is easily backportable and avoids relying on specific optimizations for correctness. > > #### Testing > > ##### Functionality > > - tier1-5 (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > ##### Performance > > - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. This pull request has now been integrated. Changeset: 83f3d42d Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/83f3d42d6bcefac80449987f4d951f8280eeee3a Stats: 73 lines in 3 files changed: 67 ins; 3 del; 3 mod 8339303: C2: dead node after failing to match cloned address expression Reviewed-by: vlivanov, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21829 From chagedorn at openjdk.org Wed Nov 6 09:50:30 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 6 Nov 2024 09:50:30 GMT Subject: RFR: 8321997: Increase upper limit of LoopOptsCount flag In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 09:13:12 GMT, Tobias Holenstein wrote: > Currently `LoopOptsCount` has a range of 5-43 with default value 43. For stress testing we want to set values higher than 43. Set to upper limit to 1000 or even max_jint. Looks good! I'm also curious what the story behind 43 is :-) ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21921#pullrequestreview-2417728600 From chagedorn at openjdk.org Wed Nov 6 09:51:32 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 6 Nov 2024 09:51:32 GMT Subject: RFR: 8342945: Replace predicate walking code in get_assertion_predicates() used for Loop Unswitching and cleaning useless Template Assertion Predicates with a predicate visitor In-Reply-To: <5rOjn2UG-Qg9QFlM4wjWWFZ2X7TicU25nS4VSI1NAu8=.f73abba2-8e1e-4494-80e0-5e092763ecec@github.com> References: <5rOjn2UG-Qg9QFlM4wjWWFZ2X7TicU25nS4VSI1NAu8=.f73abba2-8e1e-4494-80e0-5e092763ecec@github.com> Message-ID: On Wed, 6 Nov 2024 07:00:57 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and Post Loop (integrated with https://github.com/openjdk/jdk/pull/21790) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): `PhaseIdealLoop::get_assertion_predicates()` used for Loop Unswitching and removing useless Template Assertion Predicate (this PR) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (upcoming) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679 / https://github.com/openjdk/jdk/pull/21790)) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. This limitation should eventually be removed. But I want to do that separately at a later point. > ---... Thanks Tobias for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21918#issuecomment-2459156266 From galder at openjdk.org Wed Nov 6 10:19:03 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 6 Nov 2024 10:19:03 GMT Subject: RFR: 8326369: Bimorphic inlining not applied at a call site that was initially monomorphic In-Reply-To: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> References: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> Message-ID: On Wed, 6 Nov 2024 09:06:47 GMT, Galder Zamarre?o wrote: > This issue was fixed by JDK-8339299 indirectly (see [PR](https://github.com/openjdk/jdk/pull/20786)), but the PR for 8339299 didn't include any new tests, so I'm sending this PR with the test that Filipp added in 8326369 so that such issue doesn't come back inadvertently. I've added him as co-author and applied some small formatting changes. Looking into the formatting errors ------------- PR Comment: https://git.openjdk.org/jdk/pull/21920#issuecomment-2459129498 From galder at openjdk.org Wed Nov 6 10:19:03 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 6 Nov 2024 10:19:03 GMT Subject: RFR: 8326369: Bimorphic inlining not applied at a call site that was initially monomorphic Message-ID: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> This issue was fixed by JDK-8339299 indirectly (see [PR](https://github.com/openjdk/jdk/pull/20786)), but the PR for 8339299 didn't include any new tests, so I'm sending this PR with the test that Filipp added in 8326369 so that such issue doesn't come back inadvertently. I've added him as co-author and applied some small formatting changes. ------------- Commit messages: - Fix formatting - Fix more formatting issues - Fix formatting - Add test that replicates issue Changes: https://git.openjdk.org/jdk/pull/21920/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21920&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8326369 Stats: 90 lines in 1 file changed: 90 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21920.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21920/head:pull/21920 PR: https://git.openjdk.org/jdk/pull/21920 From thartmann at openjdk.org Wed Nov 6 11:34:33 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 6 Nov 2024 11:34:33 GMT Subject: RFR: 8326369: Bimorphic inlining not applied at a call site that was initially monomorphic In-Reply-To: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> References: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> Message-ID: On Wed, 6 Nov 2024 09:06:47 GMT, Galder Zamarre?o wrote: > This issue was fixed by JDK-8339299 indirectly (see [PR](https://github.com/openjdk/jdk/pull/20786)), but the PR for 8339299 didn't include any new tests, so I'm sending this PR with the test that Filipp added in 8326369 so that such issue doesn't come back inadvertently. I've added him as co-author and applied some small formatting changes. Changes requested by thartmann (Reviewer). test/hotspot/jtreg/compiler/inlining/InlineBimorphicVirtualCallAfterMorphismChanged.java line 1: > 1: /** The copyright header is missing. test/hotspot/jtreg/compiler/inlining/InlineBimorphicVirtualCallAfterMorphismChanged.java line 3: > 1: /** > 2: * @test > 3: * bug This should be `@bug 8326369` right? ------------- PR Review: https://git.openjdk.org/jdk/pull/21920#pullrequestreview-2418023116 PR Review Comment: https://git.openjdk.org/jdk/pull/21920#discussion_r1830863161 PR Review Comment: https://git.openjdk.org/jdk/pull/21920#discussion_r1830862930 From roland at openjdk.org Wed Nov 6 12:39:31 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 6 Nov 2024 12:39:31 GMT Subject: RFR: 8342945: Replace predicate walking code in get_assertion_predicates() used for Loop Unswitching and cleaning useless Template Assertion Predicates with a predicate visitor In-Reply-To: <5rOjn2UG-Qg9QFlM4wjWWFZ2X7TicU25nS4VSI1NAu8=.f73abba2-8e1e-4494-80e0-5e092763ecec@github.com> References: <5rOjn2UG-Qg9QFlM4wjWWFZ2X7TicU25nS4VSI1NAu8=.f73abba2-8e1e-4494-80e0-5e092763ecec@github.com> Message-ID: On Wed, 6 Nov 2024 07:00:57 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and Post Loop (integrated with https://github.com/openjdk/jdk/pull/21790) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): `PhaseIdealLoop::get_assertion_predicates()` used for Loop Unswitching and removing useless Template Assertion Predicate (this PR) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (upcoming) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679 / https://github.com/openjdk/jdk/pull/21790)) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. This limitation should eventually be removed. But I want to do that separately at a later point. > ---... Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21918#pullrequestreview-2418162085 From chagedorn at openjdk.org Wed Nov 6 12:39:32 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 6 Nov 2024 12:39:32 GMT Subject: RFR: 8342945: Replace predicate walking code in get_assertion_predicates() used for Loop Unswitching and cleaning useless Template Assertion Predicates with a predicate visitor In-Reply-To: <5rOjn2UG-Qg9QFlM4wjWWFZ2X7TicU25nS4VSI1NAu8=.f73abba2-8e1e-4494-80e0-5e092763ecec@github.com> References: <5rOjn2UG-Qg9QFlM4wjWWFZ2X7TicU25nS4VSI1NAu8=.f73abba2-8e1e-4494-80e0-5e092763ecec@github.com> Message-ID: On Wed, 6 Nov 2024 07:00:57 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and Post Loop (integrated with https://github.com/openjdk/jdk/pull/21790) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): `PhaseIdealLoop::get_assertion_predicates()` used for Loop Unswitching and removing useless Template Assertion Predicate (this PR) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (upcoming) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679 / https://github.com/openjdk/jdk/pull/21790)) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. This limitation should eventually be removed. But I want to do that separately at a later point. > ---... Thanks Roland for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21918#issuecomment-2459641384 From roland at openjdk.org Wed Nov 6 14:51:40 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 6 Nov 2024 14:51:40 GMT Subject: Integrated: 8343068: C2: CastX2P Ideal transformation not always applied In-Reply-To: <5k9Zq9E-2OJg0raPGHS1t_45N7QguwAToBfZiCfPjlY=.f41fd65a-7ac1-4886-a801-16753b864a2e@github.com> References: <5k9Zq9E-2OJg0raPGHS1t_45N7QguwAToBfZiCfPjlY=.f41fd65a-7ac1-4886-a801-16753b864a2e@github.com> Message-ID: On Fri, 25 Oct 2024 14:09:48 GMT, Roland Westrelin wrote: > The transformation: > > > (CastX2P (AddL base i)) -> (AddP (CastX2P base) i) > > > when i fits in an int is not always applied: when the type of `i` is > narrowed so it fits in an int, the `CastX2P` is not enqueued for > igvn. This can get in the way of vectorization as shown by test case > `test2`. This pull request has now been integrated. Changeset: 57c3bb60 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/57c3bb6091f8ba0caced6f5ecf21dc998ffeee9f Stats: 93 lines in 3 files changed: 93 ins; 0 del; 0 mod 8343068: C2: CastX2P Ideal transformation not always applied Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/21714 From roland at openjdk.org Wed Nov 6 14:53:38 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 6 Nov 2024 14:53:38 GMT Subject: Integrated: 8341834: C2 compilation fails with "bad AD file" due to Replicate In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 08:30:15 GMT, Roland Westrelin wrote: > Superword creates a `Replicate` node at a `ConvL2I` node and uses the > type of the result of the `ConvL2I` to pick the type of the > `Replicate` instead of the type of the input to the `ConvL2I`. This pull request has now been integrated. Changeset: 72a45ddb Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/72a45ddbad9c343200197348ccfcf74105e6fefa Stats: 55 lines in 2 files changed: 54 ins; 0 del; 1 mod 8341834: C2 compilation fails with "bad AD file" due to Replicate Reviewed-by: kvn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/21660 From tholenstein at openjdk.org Wed Nov 6 14:58:22 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 6 Nov 2024 14:58:22 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand Message-ID: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> color pick nodes Adds new option to IGV to color selected nodes: 1) select some nodes 2) `Ctrl + C` or `View` -> `Color action` 3) pick a color and apply ------------- Commit messages: - Update ColorAction.java - JDK-8343535: IGV: Colorize nodes on demand Changes: https://git.openjdk.org/jdk/pull/21925/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343535 Stats: 109 lines in 6 files changed: 105 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21925.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21925/head:pull/21925 PR: https://git.openjdk.org/jdk/pull/21925 From chagedorn at openjdk.org Wed Nov 6 14:58:23 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 6 Nov 2024 14:58:23 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand In-Reply-To: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: On Wed, 6 Nov 2024 12:19:47 GMT, Tobias Holenstein wrote: > color > > pick > > nodes > > Adds new option to IGV to color selected nodes: > 1) select some nodes > 2) `Ctrl + C` or `View` -> `Color action` > 3) pick a color and apply That's a nice feature! Works as expected on Linux with the short cut. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21925#pullrequestreview-2418178425 From rehn at openjdk.org Wed Nov 6 16:10:31 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 6 Nov 2024 16:10:31 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 18:33:00 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. > Thanks Yes, I'm fine with that. Just so we try to keep somekind of common thread. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21885#issuecomment-2460188120 From mdoerr at openjdk.org Wed Nov 6 16:23:49 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 6 Nov 2024 16:23:49 GMT Subject: RFR: 8343724: [PPC64] Disallow OptoScheduling Message-ID: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> Force off OptoScheduling for PPC64. It should be properly maintained before allowing users to switch it on. It's probably not important for this platform. Also see JBS issue for motivation. ------------- Commit messages: - 8343724: [PPC64] Disallow OptoScheduling Changes: https://git.openjdk.org/jdk/pull/21935/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21935&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343724 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21935.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21935/head:pull/21935 PR: https://git.openjdk.org/jdk/pull/21935 From duke at openjdk.org Wed Nov 6 16:31:34 2024 From: duke at openjdk.org (duke) Date: Wed, 6 Nov 2024 16:31:34 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support [v7] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 20:52:41 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to fix instruction encoding errors for some of the instructions which support the APX New Data Destination (NDD) and No Flags (NF) features (added in [JDK-8329035](https://bugs.openjdk.org/browse/JDK-8329035)) >> >> The correctness of the encoding is verified by comparing to the reference encoding by GCC using an automated tool (https://github.com/openjdk/jdk/pull/21795) > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > update opcodes for load based operations @vamsi-parasa Your change (at version bca87165b26116dd832b5e6b700cdaa89fa1f17e) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21770#issuecomment-2460243436 From sparasa at openjdk.org Wed Nov 6 16:44:33 2024 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 6 Nov 2024 16:44:33 GMT Subject: Integrated: 8343214: Fix encoding errors in APX New Data Destination Instructions Support In-Reply-To: References: Message-ID: <5DFO_wKi6Se-c4sbZjl5XG7TdiQcqw9UlE6RQFcgyog=.f04f6ac2-a7a4-4e3a-8190-ab07fa0348cb@github.com> On Tue, 29 Oct 2024 17:19:20 GMT, Srinivas Vamsi Parasa wrote: > The goal of this PR is to fix instruction encoding errors for some of the instructions which support the APX New Data Destination (NDD) and No Flags (NF) features (added in [JDK-8329035](https://bugs.openjdk.org/browse/JDK-8329035)) > > The correctness of the encoding is verified by comparing to the reference encoding by GCC using an automated tool (https://github.com/openjdk/jdk/pull/21795) This pull request has now been integrated. Changeset: c0e6c3b9 Author: Srinivas Vamsi Parasa Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/c0e6c3b93c0d21debc538e0135805c2957053108 Stats: 72 lines in 1 file changed: 28 ins; 1 del; 43 mod 8343214: Fix encoding errors in APX New Data Destination Instructions Support Reviewed-by: jbhateja, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/21770 From kvn at openjdk.org Wed Nov 6 17:32:27 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 6 Nov 2024 17:32:27 GMT Subject: RFR: 8342945: Replace predicate walking code in get_assertion_predicates() used for Loop Unswitching and cleaning useless Template Assertion Predicates with a predicate visitor In-Reply-To: <5rOjn2UG-Qg9QFlM4wjWWFZ2X7TicU25nS4VSI1NAu8=.f73abba2-8e1e-4494-80e0-5e092763ecec@github.com> References: <5rOjn2UG-Qg9QFlM4wjWWFZ2X7TicU25nS4VSI1NAu8=.f73abba2-8e1e-4494-80e0-5e092763ecec@github.com> Message-ID: On Wed, 6 Nov 2024 07:00:57 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and Post Loop (integrated with https://github.com/openjdk/jdk/pull/21790) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): `PhaseIdealLoop::get_assertion_predicates()` used for Loop Unswitching and removing useless Template Assertion Predicate (this PR) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (upcoming) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679 / https://github.com/openjdk/jdk/pull/21790)) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. This limitation should eventually be removed. But I want to do that separately at a later point. > ---... Looks good to me too. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21918#pullrequestreview-2419001131 From jbhateja at openjdk.org Wed Nov 6 17:39:22 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 6 Nov 2024 17:39:22 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction Message-ID: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ instruction for following IR pallets. MulVL ( AndV SRC1, 0xFFFFFFFF) ( AndV SRC2, 0xFFFFFFFF) MulVL (URShiftVL SRC1 , 32) (URShiftVL SRC2, 32) MulVL (URShiftVL SRC1 , 32) ( AndV SRC2, 0xFFFFFFFF) MulVL ( AndV SRC1, 0xFFFFFFFF) (URShiftVL SRC2 , 32) MulVL (VectorCastI2X SRC1) (VectorCastI2X SRC2) MulVL (RShiftVL SRC1 , 32) (RShiftVL SRC2, 32) A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. VPMUL[U]DQ instruction performs [unsigned] multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- Sierra Forest :- ============ Baseline:- Benchmark (SIZE) Mode Cnt Score Error Units VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms With Optimization:- Benchmark (SIZE) Mode Cnt Score Error Units VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 1299.407 ops/ms VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 504.995 ops/ms VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 327.544 ops/ms VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 160.963 ops/ms Granite Rapids:- ============= Baseline:- Benchmark (SIZE) Mode Cnt Score Error Units VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 2279.099 ops/ms VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 1148.609 ops/ms VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 570.848 ops/ms VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 268.872 ops/ms With Optimization:- Benchmark (SIZE) Mode Cnt Score Error Units VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 2612.484 ops/ms VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 1308.187 ops/ms VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 653.375 ops/ms VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 316.182 ops/ms Kindly review and share your feedback. Best Regards, Jatin ------------- Commit messages: - Removing target specific hooks - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 - Review resoultions - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 - Handle new I2L pattern, IR tests, Rewiring pattern inputs to MulVL further optimizes JIT code - Review resolutions - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction Changes: https://git.openjdk.org/jdk/pull/21244/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21244&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341137 Stats: 528 lines in 7 files changed: 527 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21244.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21244/head:pull/21244 PR: https://git.openjdk.org/jdk/pull/21244 From jkarthikeyan at openjdk.org Wed Nov 6 17:39:27 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 6 Nov 2024 17:39:27 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Sun, 29 Sep 2024 04:21:19 GMT, Jatin Bhateja wrote: > This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ instruction for following IR pallets. > > > MulVL ( AndV SRC1, 0xFFFFFFFF) ( AndV SRC2, 0xFFFFFFFF) > MulVL (URShiftVL SRC1 , 32) (URShiftVL SRC2, 32) > MulVL (URShiftVL SRC1 , 32) ( AndV SRC2, 0xFFFFFFFF) > MulVL ( AndV SRC1, 0xFFFFFFFF) (URShiftVL SRC2 , 32) > MulVL (VectorCastI2X SRC1) (VectorCastI2X SRC2) > MulVL (RShiftVL SRC1 , 32) (RShiftVL SRC2, 32) > > > > A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. > > If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. > > VPMUL[U]DQ instruction performs [unsigned] multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. > > Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- > > > Sierra Forest :- > ============ > Baseline:- > Benchmark (SIZE) Mode Cnt Score Error Units > VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms > > With Optimization:- > Benchmark (SIZE) Mode ... > Hi @jaskarth , Bigger pattern matching is sensitive to [IR level node sharing](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/matcher.cpp#L1724), thus it may not full proof for above 4 patterns. Current patch takes care of this limitation. I think this is a good point. I've taken a look at the patch and added some comments below. Hmm, do you think this pattern could be matched in the ad-files instead of the middle end? I think that might be a lot cleaner since the backend already has systems for matching node trees, which could avoid a lot of the complexity here. I think it could make the patch a lot smaller and simpler. For the record I think in this PR we could simply match the IR patterns in the ad file, since (from my understanding) the patterns we are matching could be supported there. We should do platform-specific lowering in a separate patch because it is pretty nuanced, and we could potentially move it to the new system afterwards. I'm pretty ambivalent, I think implementing it either way would be alright. Especially with unit tests, I think the lowering implementation wouldn't be that difficult. Maybe another reviewer has an opinion? About PhaseLowering though, I've found some more interesting things we could do with it, especially with improving vectorization support in the backend. @merykitty have you already started to work on it? I was thinking about prototyping it soon. Just wanted to make sure we're not doing the same work twice :) BTW, from the last conversation I had started working on PhaseLowering myself, you can see my work so far on my branch: https://github.com/jaskarth/jdk/tree/phase-lowering. I think I can publish an RFE in the coming two or three days (there were some optimizations and cleanup I was prototyping, I will remove them before sending a PR.) Do you think we should continue with my branch or do you want to approach the problem from a different way? Just want to check again to make sure we don't end up re-doing the same work :) src/hotspot/cpu/x86/matcher_x86.hpp line 184: > 182: // Does the CPU supports doubleword multiplication with quadword saturation. > 183: static constexpr bool supports_double_word_mult_with_quadword_staturation(void) { > 184: return true; Should this be `UseAVX > 0`? I'm wondering since we have a `MulVL` rule that applies when `UseAVX == 0`. src/hotspot/share/opto/vectornode.cpp line 2089: > 2087: if (Matcher::supports_double_word_mult_with_quadword_staturation() && > 2088: !is_mult_lower_double_word()) { > 2089: auto is_clear_upper_double_word_uright_shift_op = [](const Node *n) { Suggestion: auto is_clear_upper_double_word_uright_shift_op = [](const Node* n) { src/hotspot/share/opto/vectornode.cpp line 2093: > 2091: n->in(2)->Opcode() == Op_RShiftCntV && n->in(2)->in(1)->is_Con() && > 2092: n->in(2)->in(1)->bottom_type()->isa_int() && > 2093: n->in(2)->in(1)->bottom_type()->is_int()->get_con() == 32L; Suggestion: n->in(2)->in(1)->bottom_type()->is_int()->get_con() == 32; Since you are comparing with a `TypeInt` I think this shouldn't be `32L`. src/hotspot/share/opto/vectornode.cpp line 2098: > 2096: auto is_lower_double_word_and_mask_op = [](const Node *n) { > 2097: if (n->Opcode() == Op_AndV) { > 2098: Node *replicate_operand = n->in(1)->Opcode() == Op_Replicate ? n->in(1) Suggestion: Node* replicate_operand = n->in(1)->Opcode() == Op_Replicate ? n->in(1) src/hotspot/share/opto/vectornode.cpp line 2124: > 2122: // MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) > 2123: if ((is_lower_double_word_and_mask_op(in(1)) || > 2124: is_lower_double_word_and_mask_op(in(1)) || `is_lower_double_word_and_mask_op(in(1)) || is_lower_double_word_and_mask_op(in(1))` is redundant, right? Shouldn't you only need it once? Same for the other 3 calls, which are similarly repeated. test/hotspot/jtreg/compiler/vectorapi/VectorMultiplyOpt.java line 41: > 39: */ > 40: > 41: public class VectorMultiplyOpt { Could it be possible to also do IR verification in this test? It would be good to check that we don't generate `AndVL` or `URShiftVL` with this transform. test/hotspot/jtreg/compiler/vectorapi/VectorMultiplyOpt.java line 43: > 41: public class VectorMultiplyOpt { > 42: > 43: public static long [] src1; Suggestion: public static long[] src1; And for the rest of the `long []` in this file too. test/micro/org/openjdk/bench/jdk/incubator/vector/VectorXXH3HashingBenchmark.java line 39: > 37: @Param({"1024", "2048", "4096", "8192"}) > 38: private int SIZE; > 39: private long [] accumulators; Suggestion: private long[] accumulators; ------------- PR Review: https://git.openjdk.org/jdk/pull/21244#pullrequestreview-2367683334 PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2407658405 PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2411538179 PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2414553899 PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2422700344 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800159123 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800153755 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800153568 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800153842 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800151177 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800167403 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800165261 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800169840 From jbhateja at openjdk.org Wed Nov 6 17:39:27 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 6 Nov 2024 17:39:27 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Sun, 29 Sep 2024 04:21:19 GMT, Jatin Bhateja wrote: > This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ instruction for following IR pallets. > > > MulVL ( AndV SRC1, 0xFFFFFFFF) ( AndV SRC2, 0xFFFFFFFF) > MulVL (URShiftVL SRC1 , 32) (URShiftVL SRC2, 32) > MulVL (URShiftVL SRC1 , 32) ( AndV SRC2, 0xFFFFFFFF) > MulVL ( AndV SRC1, 0xFFFFFFFF) (URShiftVL SRC2 , 32) > MulVL (VectorCastI2X SRC1) (VectorCastI2X SRC2) > MulVL (RShiftVL SRC1 , 32) (RShiftVL SRC2, 32) > > > > A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. > > If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. > > VPMUL[U]DQ instruction performs [unsigned] multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. > > Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- > > > Sierra Forest :- > ============ > Baseline:- > Benchmark (SIZE) Mode Cnt Score Error Units > VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms > > With Optimization:- > Benchmark (SIZE) Mode ... Hi @iwanowww , @sviswa7, @merykitty, Can you kindly review this. I re-evaluated the solution and feel that lowering pass will compliment such transformation, specially in light of re-wiring logic to directly feed the pattern inputs to Multiplier, while x86 VMULUDQ expects to operate on lower doubleword of each quadword lane, AARCH64 SVE has instructions which considers upper doubleword of quadword multiplier and multiplicand and hence can optimize following pattern too ` MulVL ( SRC1 << 32 ) * ( SRC2 << 32 ) ` https://www.felixcloutier.com/x86/pmuludq https://dougallj.github.io/asil/doc/umullt_z_zz_32.html I am in process of introducing a PhaseLowering which will have target specific IR transformations for nodes of interest, till then moving the PR to draft stage. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2401895553 PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2422634178 From qamai at openjdk.org Wed Nov 6 17:39:29 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 6 Nov 2024 17:39:29 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Sun, 29 Sep 2024 04:21:19 GMT, Jatin Bhateja wrote: > This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ instruction for following IR pallets. > > > MulVL ( AndV SRC1, 0xFFFFFFFF) ( AndV SRC2, 0xFFFFFFFF) > MulVL (URShiftVL SRC1 , 32) (URShiftVL SRC2, 32) > MulVL (URShiftVL SRC1 , 32) ( AndV SRC2, 0xFFFFFFFF) > MulVL ( AndV SRC1, 0xFFFFFFFF) (URShiftVL SRC2 , 32) > MulVL (VectorCastI2X SRC1) (VectorCastI2X SRC2) > MulVL (RShiftVL SRC1 , 32) (RShiftVL SRC2, 32) > > > > A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. > > If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. > > VPMUL[U]DQ instruction performs [unsigned] multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. > > Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- > > > Sierra Forest :- > ============ > Baseline:- > Benchmark (SIZE) Mode Cnt Score Error Units > VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms > > With Optimization:- > Benchmark (SIZE) Mode ... Another approach is to do similarly to `MacroLogicVNode`. You can make another node and transform `MulVL` to it before matching, this is more flexible than using match rules. I am having a similar idea that is to group those transformations together into a `Phase` called `PhaseLowering`. It can be used to do e.g split `ExtractI` into the 128-bit lane extraction and the element extraction from that lane. This allows us to do `GVN` on those and `v.lane(5) + v.lane(7)` can be compiled nicely as: vextracti128 xmm0, ymm1, 1 pextrd eax, xmm0, 1 // vextracti128 xmm0, ymm1, 1 here will be gvn-ed pextrd ecx, xmm0, 3 add eax, ecx Personally, I think this optimization is not essential, so we should proceed with introducing lowering first, then add this transformation to that phase, instead of trying to integrate this transformation then refactor it into phase lowering, which seems like a net extra step. The issues I have with this patch are that: - It convolutes the graph with machine-dependent nodes early in the compiling process. - It overloads `MulVL` with alternative behaviours, it is fine now as we do not perform much analysis on this node but it would be problematic later. I think it is more preferable to have a separate IR node for this like `MulVLowIToLNode`, or have this transformation be done only just before matching, or both. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2407793168 PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2414491182 PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421157206 From jkarthikeyan at openjdk.org Wed Nov 6 17:39:29 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 6 Nov 2024 17:39:29 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 11 Oct 2024 16:54:23 GMT, Quan Anh Mai wrote: > I am having a similar idea that is to group those transformations together into a `Phase` called `PhaseLowering` I think such a phase could be quite useful in general. Recently I was trying to implement the BMI1 instruction `bextr` for better performance with bit masks, but ran into a problem where it doesn't have an immediate encoding so we'd need to manifest a constant into a temporary register every time. With an (x86-specific) ideal node, we could simply let the register allocator handle placing the constant. It would also be nice to avoid needing to put similar backend-specific lowerings (such as `MacroLogicV`) in shared code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2407821557 From jbhateja at openjdk.org Wed Nov 6 17:39:29 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 6 Nov 2024 17:39:29 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 11 Oct 2024 17:12:49 GMT, Jasmine Karthikeyan wrote: > > I am having a similar idea that is to group those transformations together into a `Phase` called `PhaseLowering` > > I think such a phase could be quite useful in general. Recently I was trying to implement the BMI1 instruction `bextr` for better performance with bit masks, but ran into a problem where it doesn't have an immediate encoding so we'd need to manifest a constant into a temporary register every time. With an (x86-specific) ideal node, we could simply let the register allocator handle placing the constant. It would also be nice to avoid needing to put similar backend-specific lowerings (such as `MacroLogicV`) in shared code. Hey @jaskarth , @merykitty , we already have an infrastructure where during parsing we create Macro Nodes which can be lowered / expanded to multiple IRs nodes during macro expansion, what we need in this case is a target specific IR pattern check since not all targets may support 32x32 multiplication with quadword saturation, idea is to avoid creating a new IR and piggyback needed information on existing MulVL IR, we already use such tricks for relaxed unsafe reductions. Going forward, infusion of KnownBits into our data flow analysis infrastructure will streamline such optimizations, this patch is performing point optimization for specific set of constrained multiplication patterns. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2411053693 From qamai at openjdk.org Wed Nov 6 17:39:30 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 6 Nov 2024 17:39:30 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: <2g_Hm5UuVBqoklekkaxtnYn05JYKmosnzaMefQi_q3s=.916470fa-352d-410c-b187-f6453bb53630@github.com> On Mon, 14 Oct 2024 12:12:58 GMT, Jatin Bhateja wrote: >>> I am having a similar idea that is to group those transformations together into a `Phase` called `PhaseLowering` >> >> I think such a phase could be quite useful in general. Recently I was trying to implement the BMI1 instruction `bextr` for better performance with bit masks, but ran into a problem where it doesn't have an immediate encoding so we'd need to manifest a constant into a temporary register every time. With an (x86-specific) ideal node, we could simply let the register allocator handle placing the constant. It would also be nice to avoid needing to put similar backend-specific lowerings (such as `MacroLogicV`) in shared code. > >> > I am having a similar idea that is to group those transformations together into a `Phase` called `PhaseLowering` >> >> I think such a phase could be quite useful in general. Recently I was trying to implement the BMI1 instruction `bextr` for better performance with bit masks, but ran into a problem where it doesn't have an immediate encoding so we'd need to manifest a constant into a temporary register every time. With an (x86-specific) ideal node, we could simply let the register allocator handle placing the constant. It would also be nice to avoid needing to put similar backend-specific lowerings (such as `MacroLogicV`) in shared code. > > Hey @jaskarth , @merykitty , we already have an infrastructure where during parsing we create Macro Nodes which can be lowered / expanded to multiple IRs nodes during macro expansion, what we need in this case is a target specific IR pattern check since not all targets may support 32x32 multiplication with quadword saturation, idea is to avoid creating a new IR and piggyback needed information on existing MulVL IR, we already use such tricks for relaxed unsafe reductions. Going forward, infusion of KnownBits into our data flow analysis infrastructure will streamline such optimizations, this patch is performing point optimization for specific set of constrained multiplication patterns. @jatin-bhateja That is machine-independent lowering, we are talking about machine-dependent lowering to which `MacroLogicV` transformation belongs. You can have `phaselowering_x86` and not have to add another method to `Matcher` as well as add default implementations to various architecture files. You can reuse `MulVL` node for that but I believe these transformations should be done as late as possible. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2411389030 From jbhateja at openjdk.org Wed Nov 6 17:39:31 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 6 Nov 2024 17:39:31 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Mon, 14 Oct 2024 15:04:54 GMT, Jasmine Karthikeyan wrote: > For the record I think in this PR we could simply match the IR patterns in the ad file, since (from my understanding) the patterns we are matching could be supported there. We should do platform-specific lowering in a separate patch because it is pretty nuanced, and we could potentially move it to the new system afterwards. Hi @jaskarth , Bigger pattern matching is sensitive to [IR level node sharing](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/matcher.cpp#L1724), thus it may not be full proof for above 4 patterns. Current patch takes care of this limitation. > @jatin-bhateja That is machine-independent lowering, we are talking about machine-dependent lowering to which `MacroLogicV` transformation belongs. You can have `phaselowering_x86` and not have to add another method to `Matcher` as well as add default implementations to various architecture files. You can reuse `MulVL` node for that but I believe these transformations should be done as late as possible. Hi @merykitty, I see some scope of refactoring and carving out a separate target specific lowering pass going forward, I have brough this up in past too. Existing optimizations are in line with current infrastructure and guards target specific optimizations with target specific match_rule_supported checks e.g. https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/compile.cpp#L2898. As @jaskarth suggests we can pick this up going forward. > BTW, from the last conversation I had started working on PhaseLowering myself, you can see my work so far on my branch: https://github.com/jaskarth/jdk/tree/phase-lowering. I think I can publish an RFE in the coming two or three days (there were some optimizations and cleanup I was prototyping, I will remove them before sending a PR.) Do you think we should continue with my branch or do you want to approach the problem from a different way? Just want to check again to make sure we don't end up re-doing the same work :) Hi @jaskarth , Please add PhaseLowering skeleton code only and then we can add applicable lowering transforms in seperate patches e.g . I volenteer to move x86 side lowering transforms like MacroLogic Optimization along with this doubleword multiplication pass. We need to carefully take such decisions keeping in the view the code duplication aspects, so only very specific IR transforms should be lowered, common transforms should still be part of shared code. Let me know if you have any concerns. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2411884206 PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2422981643 From vlivanov at openjdk.org Wed Nov 6 17:39:32 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 6 Nov 2024 17:39:32 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Sun, 29 Sep 2024 04:21:19 GMT, Jatin Bhateja wrote: > This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ instruction for following IR pallets. > > > MulVL ( AndV SRC1, 0xFFFFFFFF) ( AndV SRC2, 0xFFFFFFFF) > MulVL (URShiftVL SRC1 , 32) (URShiftVL SRC2, 32) > MulVL (URShiftVL SRC1 , 32) ( AndV SRC2, 0xFFFFFFFF) > MulVL ( AndV SRC1, 0xFFFFFFFF) (URShiftVL SRC2 , 32) > MulVL (VectorCastI2X SRC1) (VectorCastI2X SRC2) > MulVL (RShiftVL SRC1 , 32) (RShiftVL SRC2, 32) > > > > A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. > > If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. > > VPMUL[U]DQ instruction performs [unsigned] multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. > > Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- > > > Sierra Forest :- > ============ > Baseline:- > Benchmark (SIZE) Mode Cnt Score Error Units > VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms > > With Optimization:- > Benchmark (SIZE) Mode ... Some time ago, there was a relevant experiment to optimize vectorized Poly1305 implementation by utilizing VPMULDQ instruction on x86 (see [JDK-8219881](https://bugs.openjdk.org/browse/JDK-8219881) for details). The implementation used int-to-long vector casts and produced the following IR shape: `MulVL (VectorCastI2X src1) (VectorCastI2X src2)`. Does it make sense to cover it as part of this particular enhancement? IMO until C2 type system starts to track bitwise constant information ([JDK-8001436](https://bugs.openjdk.org/browse/JDK-8001436) et al), there are not enough benefits to rely on IGVN here. So far, all the discussed patterns are simple enough for matcher to handle them without too much tweaking. Also, I briefly looked at #21599 in the context of this particular enhancement, but still don't see how it can improve the situation (except input rewiring part) and not simply duplicate what matcher already does well. src/hotspot/share/opto/vectornode.cpp line 2122: > 2120: // MulL (URShift SRC1 , 32) (URShift SRC2, 32) > 2121: // MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) > 2122: // MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) I don't understand how it works... According to the documentation, `VPMULDQ`/`VPMULUDQ` consume vectors of double words and produce a vector of quadwords. But it looks like `SRC1`/`SRC2` are always vectors of longs (quadwords). And `vmuludq_reg` in `x86.ad` just takes the immedate operands and pass them into `vpmuludq` which doesn't look right... ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2412582542 PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421529658 PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2436531693 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1805886268 From qamai at openjdk.org Wed Nov 6 17:39:33 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 6 Nov 2024 17:39:33 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Tue, 15 Oct 2024 17:00:26 GMT, Jasmine Karthikeyan wrote: >> This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ instruction for following IR pallets. >> >> >> MulVL ( AndV SRC1, 0xFFFFFFFF) ( AndV SRC2, 0xFFFFFFFF) >> MulVL (URShiftVL SRC1 , 32) (URShiftVL SRC2, 32) >> MulVL (URShiftVL SRC1 , 32) ( AndV SRC2, 0xFFFFFFFF) >> MulVL ( AndV SRC1, 0xFFFFFFFF) (URShiftVL SRC2 , 32) >> MulVL (VectorCastI2X SRC1) (VectorCastI2X SRC2) >> MulVL (RShiftVL SRC1 , 32) (RShiftVL SRC2, 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMUL[U]DQ instruction performs [unsigned] multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimizati... > > I'm pretty ambivalent, I think implementing it either way would be alright. Especially with unit tests, I think the lowering implementation wouldn't be that difficult. Maybe another reviewer has an opinion? > > About PhaseLowering though, I've found some more interesting things we could do with it, especially with improving vectorization support in the backend. @merykitty have you already started to work on it? I was thinking about prototyping it soon. Just wanted to make sure we're not doing the same work twice :) @jaskarth Please proceed with it, I have a really simple prototype for it but I don't have any plan to proceed further soon. Thanks a lot :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2414605470 From jbhateja at openjdk.org Wed Nov 6 17:39:33 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 6 Nov 2024 17:39:33 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Tue, 15 Oct 2024 00:28:25 GMT, Vladimir Ivanov wrote: > MulVL (VectorCastI2X src1) (VectorCastI2X src2 It looks unsafe to me, since VectorCastI2L sign-extends integer lanes, thus we may not be able to neglect partial products of upper doublewords while performing 64x64 bit multiplication. Existing patterns guarantees clearing of upper double words thereby result computation only depends on lower doubleword multiplication. > Personally, I think this optimization is not essential, so we should proceed with introducing lowering first, then add this transformation to that phase, instead of trying to integrate this transformation then refactor it into phase lowering, which seems like a net extra step. I think we should not block inflight patches in anticipation of new refactoring. We can always tune it later. > I'm pretty ambivalent, I think implementing it either way would be alright. Especially with unit tests, I think the lowering implementation wouldn't be that difficult. Maybe another reviewer has an opinion? > > About PhaseLowering though, I've found some more interesting things we could do with it, especially with improving vectorization support in the backend. @merykitty have you already started to work on it? I was thinking about prototyping it soon. Just wanted to make sure we're not doing the same work twice :) It will be good to float an RFP with some use-cases upfront before development. As @jaskarth pointed out some vectorization improvements. > IMO until C2 type system starts to track bitwise constant information ([JDK-8001436](https://bugs.openjdk.org/browse/JDK-8001436) et al), there are not enough benefits to rely on IGVN here. So far, all the discussed patterns are simple enough for matcher to handle them without too much tweaking. Hi @iwanowww , I have implemented additional pattern you suggested. In addition re-wiring pattern inputs to MulVL IR to avoid emitting upper doubleword clearing logic in applicable scenarios. Hi @jaskarth , @merykitty , As discussed, waiting on PhaseLowering skeleton to move some part of this patch to x86 specific lowering pass. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2420384086 PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2423716135 From vlivanov at openjdk.org Wed Nov 6 17:39:34 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 6 Nov 2024 17:39:34 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Thu, 17 Oct 2024 19:40:52 GMT, Jatin Bhateja wrote: >> MulVL (VectorCastI2X src1) (VectorCastI2X src2) > It looks unsafe to me, since VectorCastI2L sign-extends integer lanes, ... Hm, I don't see any problems with it if `VPMULDQ` is used. Sign extension becomes redundant when 64-bit multiplication is strength-reduced to 32-bit one (32x32->64). Am I missing something important here? >> IMO until C2 type system starts to track bitwise constant information ([JDK-8001436](https://bugs.openjdk.org/browse/JDK-8001436) et al), there are not enough benefits to rely on IGVN here. So far, all the discussed patterns are simple enough for matcher to handle them without too much tweaking. > > Hi @iwanowww , > I have implemented additional pattern you suggested. > In addition re-wiring pattern inputs to MulVL IR to avoid emitting upper doubleword clearing logic in applicable scenarios. > > Hi @jaskarth , @merykitty , > As discussed, waiting on PhaseLowering skeleton to move some part of this patch to x86 specific lowering pass. Thanks, @jatin-bhateja. I took a look at the latest version and still think that IGVN is not the best place for it. First of all, flags on MulVL feel too adhoc and irregular. The original IR structure is still there (except the cases when inputs are rewired), so can be easily recomputed on demand. I noticed that the patterns can be generalized: what matters is whether upper half is filled with zeros/sign bits or not, so small enough masks (and large enough shifts) are amenable to the same optimization. But, in such case, input rewiring becomes applicable only to particular constant inputs. (BTW signed right shifts can be optimized in a similar way, since they populate upper half with the sign-bit.) So, IMO the best way to move this particular enhancement forward is: * perform the transformation during matching; * match a single MulVL node and shape the checks on argument shape as predicates on AD instructions * setting lower instruction costs should tell the matcher to prefer new specific instructions over generic ones; * avoid input rewiring for now (VPMULDQ/VPMULUDQ give enough performance improvement on its own). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2420668490 PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2436528498 From vlivanov at openjdk.org Wed Nov 6 17:39:34 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 6 Nov 2024 17:39:34 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Tue, 15 Oct 2024 17:26:49 GMT, Quan Anh Mai wrote: >> I'm pretty ambivalent, I think implementing it either way would be alright. Especially with unit tests, I think the lowering implementation wouldn't be that difficult. Maybe another reviewer has an opinion? >> >> About PhaseLowering though, I've found some more interesting things we could do with it, especially with improving vectorization support in the backend. @merykitty have you already started to work on it? I was thinking about prototyping it soon. Just wanted to make sure we're not doing the same work twice :) > > @jaskarth Please proceed with it, I have a really simple prototype for it but I don't have any plan to proceed further soon. Thanks a lot :) @merykitty The approach @jatin-bhateja proposes looks well-justified to me. Matching is essentially a lowering step which transforms platform-independent Ideal IR into platform-specific Mach IR. And collapsing non-trivial IR trees into platform-specific instructions is a well-established pattern in the code. Indeed, there are some constraints matching imposes, so it may not be flexible enough to cover all use cases. In particular, for `VPTERNLOGD`/`VPTERNLOGQ` it was decided it's worth the effort to handle them specially (see `Compile::optimize_logic_cones()`). As it is implemented now, it's part of the shared code, but if there's platform-specific custom lowering phase available one day, it can be moved there, of course. But speaking of `VPMULDQ`/`VPMULUDQ`, what kind of benefits do you see from custom logic to support them? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2420732705 From jbhateja at openjdk.org Wed Nov 6 17:39:34 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 6 Nov 2024 17:39:34 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Thu, 17 Oct 2024 21:53:16 GMT, Vladimir Ivanov wrote: > > > MulVL (VectorCastI2X src1) (VectorCastI2X src2) > > > It looks unsafe to me, since VectorCastI2L sign-extends integer lanes, ... > > Hm, I don't see any problems with it if `VPMULDQ` is used. Sign extension becomes redundant when 64-bit multiplication is strength-reduced to 32-bit one (32x32->64). Am I missing something important here? @iwanowww , Agree!, I missed noticing that you were talking about **VPMULDQ**, its a signed doubleword multiplier with quadword saturation, so it should be ok to include suggested pattern. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421132055 From jbhateja at openjdk.org Wed Nov 6 17:39:36 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 6 Nov 2024 17:39:36 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 18 Oct 2024 02:41:47 GMT, Quan Anh Mai wrote: > The issues I have with this patch are that: > > * It convolutes the graph with machine-dependent nodes early in the compiling process. MulVL is a machine independent IR, we create a machine dependent IR post matching. > * It overloads `MulVL` with alternative behaviours, it is fine now as we do not perform much analysis on this node but it would be problematic later. I think it is more preferable to have a separate IR node for this like `MulVLowIToLNode`, or have this transformation be done only just before matching, or both. I see this is as a twostep optimization, in the first step we do analysis and annotate additional information on existing IR, which is later used by instruction selector. I plan to subsume first stage with enhanced dataflow analysis going forward. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421300738 From vlivanov at openjdk.org Wed Nov 6 17:39:36 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 6 Nov 2024 17:39:36 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: <34KZVRjCMAl5-KAG6hLnJUe2RZF2fThQAWuresTL5Pk=.a797f2d0-2915-4175-8c7c-3381fdc578cb@github.com> On Fri, 18 Oct 2024 04:16:15 GMT, Jatin Bhateja wrote: > It convolutes the graph with machine-dependent nodes early in the compiling process. Ah, I see your point now! I took a closer look at the patch and indeed `MulVLNode::_mult_lower_double_word` with `MulVLNode::Ideal()` don't look pretty. @jatin-bhateja why don't you turn the logic it into match rules instead? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421372120 From qamai at openjdk.org Wed Nov 6 17:39:37 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 6 Nov 2024 17:39:37 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 18 Oct 2024 04:16:15 GMT, Jatin Bhateja wrote: > I see this is as a twostep optimization, in the first step we do analysis and annotate additional information on existing IR, which is later used by instruction selector. I plan to subsume first stage with enhanced dataflow analysis going forward. The issue is that a node is not immutable. This puts a burden on every place to keep the annotation sane when doing transformations, which is easily missed when there are a lot of kinds of `Node`s out there. That's why I think it is most suitable to be done only right before matching. `Node::Ideal` is invoked in a really generous manner so I would prefer not to add analysis to it that can be done more efficiently somewhere else. Additionally, if you have a separate IR node for this operation, you can do some more beneficial transformations such as `MulVL (AndV x max_juint) (AndV y max_juint)` into `MulVLowIToL x y`. My suggestions are based on this PR as a standalone, so they may not be optimal when looking at a wider perspective, in case you think this approach would fit more nicely into a larger landscape of your planned enhancements please let us know. Thanks for your patience. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421376285 From vlivanov at openjdk.org Wed Nov 6 17:39:37 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 6 Nov 2024 17:39:37 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 18 Oct 2024 05:05:16 GMT, Quan Anh Mai wrote: > The issue is that a node is not immutable. I don't see any issues with mutability here. `MulVLNode::_mult_lower_double_word` is constant, so you have to allocate new node if you want to change its value. (And that's exactly what `MulVLNode::Ideal()` does.) But I agree with you that a dedicated ideal node type (e.g., `MulVI2L`) is much cleaner than `MulVLNode::_mult_lower_double_word`. Still, I'd prefer to see the logic confined in matcher-related code instead. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421412061 From qamai at openjdk.org Wed Nov 6 17:39:37 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 6 Nov 2024 17:39:37 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 18 Oct 2024 05:16:04 GMT, Vladimir Ivanov wrote: >>> I see this is as a twostep optimization, in the first step we do analysis and annotate additional information on existing IR, which is later used by instruction selector. I plan to subsume first stage with enhanced dataflow analysis going forward. >> >> The issue is that a node is not immutable. This puts a burden on every place to keep the annotation sane when doing transformations, which is easily missed when there are a lot of kinds of `Node`s out there. That's why I think it is most suitable to be done only right before matching. `Node::Ideal` is invoked in a really generous manner so I would prefer not to add analysis to it that can be done more efficiently somewhere else. Additionally, if you have a separate IR node for this operation, you can do some more beneficial transformations such as `MulVL (AndV x max_juint) (AndV y max_juint)` into `MulVLowIToL x y`. >> >> My suggestions are based on this PR as a standalone, so they may not be optimal when looking at a wider perspective, in case you think this approach would fit more nicely into a larger landscape of your planned enhancements please let us know. Thanks for your patience. > >> The issue is that a node is not immutable. > > I don't see any issues with mutability here. `MulVLNode::_mult_lower_double_word` is constant, so you have to allocate new node if you want to change its value. (And that's exactly what `MulVLNode::Ideal()` does.) But I agree with you that a dedicated ideal node type (e.g., `MulVI2L`) is much cleaner than `MulVLNode::_mult_lower_double_word`. Still, I'd prefer to see the logic confined in matcher-related code instead. @iwanowww IMO there are 2 ways to view this: - You can see a `MulVL` nodes with `_mult_lower_double_word` being an entirely different kind of nodes which do a different thing (a.k.a throw away the upper bits and only multiply the lower bits), in this case it is a machine-dependent IR node hiding behind the opcode of `MulVL` and changing the inputs of it is not worrying because the node does not care about that anyway, its semantics is predetermined already. - Or you can see `_mult_lower_double_word` being an annotation that adds information to `MulVL`, which means it is still a `MulVL` but annotated with information saying that all upper bits of the operands are 0. I think this is Jatin's point of view right now. The issue here would be to keep the annotation sane when the node inputs may be changed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421441405 From jbhateja at openjdk.org Wed Nov 6 17:39:38 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 6 Nov 2024 17:39:38 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 18 Oct 2024 05:16:04 GMT, Vladimir Ivanov wrote: >>> I see this is as a twostep optimization, in the first step we do analysis and annotate additional information on existing IR, which is later used by instruction selector. I plan to subsume first stage with enhanced dataflow analysis going forward. >> >> The issue is that a node is not immutable. This puts a burden on every place to keep the annotation sane when doing transformations, which is easily missed when there are a lot of kinds of `Node`s out there. That's why I think it is most suitable to be done only right before matching. `Node::Ideal` is invoked in a really generous manner so I would prefer not to add analysis to it that can be done more efficiently somewhere else. Additionally, if you have a separate IR node for this operation, you can do some more beneficial transformations such as `MulVL (AndV x max_juint) (AndV y max_juint)` into `MulVLowIToL x y`. >> >> My suggestions are based on this PR as a standalone, so they may not be optimal when looking at a wider perspective, in case you think this approach would fit more nicely into a larger landscape of your planned enhancements please let us know. Thanks for your patience. > >> The issue is that a node is not immutable. > > I don't see any issues with mutability here. `MulVLNode::_mult_lower_double_word` is constant, so you have to allocate new node if you want to change its value. (And that's exactly what `MulVLNode::Ideal()` does.) But I agree with you that a dedicated ideal node type (e.g., `MulVI2L`) is much cleaner than `MulVLNode::_mult_lower_double_word`. Still, I'd prefer to see the logic confined in matcher-related code instead. Hi @iwanowww , @merykitty , I am in process of addressing all your concerns. I still feel idealization is the right place to execute this pattern detection, we just need to re-wire the effective inputs bypassing doubleword clearing logic to newly annotated MulVL node and allow clearing IR to sweepout during successive passes, moving it to final graph reshaping just before instruction selection will prevent dead IR cleanups. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421448784 From vlivanov at openjdk.org Wed Nov 6 17:39:38 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 6 Nov 2024 17:39:38 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 18 Oct 2024 05:35:27 GMT, Quan Anh Mai wrote: >>> The issue is that a node is not immutable. >> >> I don't see any issues with mutability here. `MulVLNode::_mult_lower_double_word` is constant, so you have to allocate new node if you want to change its value. (And that's exactly what `MulVLNode::Ideal()` does.) But I agree with you that a dedicated ideal node type (e.g., `MulVI2L`) is much cleaner than `MulVLNode::_mult_lower_double_word`. Still, I'd prefer to see the logic confined in matcher-related code instead. > > @iwanowww IMO there are 2 ways to view this: > > - You can see a `MulVL` nodes with `_mult_lower_double_word` being an entirely different kind of nodes which do a different thing (a.k.a throw away the upper bits and only multiply the lower bits), in this case it is a machine-dependent IR node hiding behind the opcode of `MulVL` and changing the inputs of it is not worrying because the node does not care about that anyway, its semantics is predetermined already. > - Or you can see `_mult_lower_double_word` being an annotation that adds information to `MulVL`, which means it is still a `MulVL` but annotated with information saying that all upper bits of the operands are 0. I think this is Jatin's point of view right now. The issue here would be to keep the annotation sane when the node inputs may be changed. @merykitty I was under an erroneous impression that `MulVL::Ideal()` folds operands of particular shapes into `MulVL::_mult_lower_double_word == true`. Now I see it's not the case. Indeed, what `MulVL::Ideal()` does is it caches the info about operand shapes in `MulVL::_mult_lower_double_word` which introduces unnecessary redundancy. I doubt it is possible for IR to diverge so much (through a sequence of equivalent transformations) that the bit gets out of sync (unless there's a bug in compiler or a paradoxical situation in effectively dead code occurs). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421504978 From qamai at openjdk.org Wed Nov 6 17:39:39 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 6 Nov 2024 17:39:39 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: <8p95gYaAnNAIfqVBosZgvMMCVhHn2M0fQx7FLLgCn9U=.852c7aef-327c-4c2f-a591-0efde9ccc2e6@github.com> On Fri, 18 Oct 2024 05:42:21 GMT, Jatin Bhateja wrote: >>> The issue is that a node is not immutable. >> >> I don't see any issues with mutability here. `MulVLNode::_mult_lower_double_word` is constant, so you have to allocate new node if you want to change its value. (And that's exactly what `MulVLNode::Ideal()` does.) But I agree with you that a dedicated ideal node type (e.g., `MulVI2L`) is much cleaner than `MulVLNode::_mult_lower_double_word`. Still, I'd prefer to see the logic confined in matcher-related code instead. > > Hi @iwanowww , @merykitty , I am in process of addressing all your concerns. > > I still feel idealization is the right place to execute this pattern detection, we just need to re-wire the effective inputs bypassing doubleword clearing logic to newly annotated MulVL node and allow clearing IR to sweepout during successive passes, moving it to final graph reshaping just before instruction selection will prevent dead IR cleanups. @jatin-bhateja I think you can do it at the same place as `Compile::optimize_logic_cones`, we do perform IGVN there. Unless you think this information is needed early in the compiling process, currently I see it is used during matching only, which makes it unnecessary to repeatedly checking it in `Node::Ideal` ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421519087 From jkarthikeyan at openjdk.org Wed Nov 6 17:39:39 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 6 Nov 2024 17:39:39 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Sat, 19 Oct 2024 09:25:12 GMT, Jatin Bhateja wrote: >> IMO until C2 type system starts to track bitwise constant information ([JDK-8001436](https://bugs.openjdk.org/browse/JDK-8001436) et al), there are not enough benefits to rely on IGVN here. So far, all the discussed patterns are simple enough for matcher to handle them without too much tweaking. > > Hi @iwanowww , > I have implemented additional pattern you suggested. > In addition re-wiring pattern inputs to MulVL IR to avoid emitting upper doubleword clearing logic in applicable scenarios. > > Hi @jaskarth , @merykitty , > As discussed, waiting on PhaseLowering skeleton to move some part of this patch to x86 specific lowering pass. Hi @jatin-bhateja, I've opened a PR for the new pass here: #21599. I've added just the skeleton code, like you suggested. Let me know what you think! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2425542577 From vlivanov at openjdk.org Wed Nov 6 17:39:40 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 6 Nov 2024 17:39:40 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Thu, 24 Oct 2024 23:47:29 GMT, Vladimir Ivanov wrote: > So, IMO the best way to move this particular enhancement forward is: ... @jatin-bhateja here's a sketch (not tested): https://github.com/openjdk/jdk/compare/master...iwanowww:jdk:pr/21244 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2455955390 From jbhateja at openjdk.org Wed Nov 6 17:39:41 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 6 Nov 2024 17:39:41 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: <0fLBeJHlkgf0PnTP6gnbYeZ2P7yEceS1MW5oSf3q43s=.25320887-be06-46b1-919c-f3d25d46c039@github.com> On Tue, 5 Nov 2024 00:07:51 GMT, Vladimir Ivanov wrote: >> Thanks, @jatin-bhateja. I took a look at the latest version and still think that IGVN is not the best place for it. >> >> First of all, flags on MulVL feel too adhoc and irregular. The original IR structure is still there (except the cases when inputs are rewired), so can be easily recomputed on demand. >> >> I noticed that the patterns can be generalized: what matters is whether upper half is filled with zeros/sign bits or not, so small enough masks (and large enough shifts) are amenable to the same optimization. But, in such case, input rewiring becomes applicable only to particular constant inputs. >> >> (BTW signed right shifts can be optimized in a similar way, since they populate upper half with the sign-bit.) >> >> So, IMO the best way to move this particular enhancement forward is: >> * perform the transformation during matching; >> * match a single MulVL node and shape the checks on argument shape as predicates on AD instructions >> * setting lower instruction costs should tell the matcher to prefer new specific instructions over generic ones; >> * avoid input rewiring for now (VPMULDQ/VPMULUDQ give enough performance improvement on its own). > >> So, IMO the best way to move this particular enhancement forward is: ... > > @jatin-bhateja here's a sketch (not tested): https://github.com/openjdk/jdk/compare/master...iwanowww:jdk:pr/21244 Hi @iwanowww , Thanks for refactoring! your suggestions are included. Points in favor of the current approach:- - Patch strength reduces 15 cycles full quadword multiplier to 5 cycles double word multiplier with quadword saturation. - IR remains target independent, we are not directly forwarding the pattern inputs to the multiplier, such rewiring is only possible when we mask out the upper double word of inputs, for other cases like right shifting (logical) inputs by 32 or upcasting integral to long lanes we still need to emit the input preparation/formatting instruction sequence. - Patch shows performance improvement on both E and P core Xeons. Following are the performance number for include micro benchmarks. ![image](https://github.com/user-attachments/assets/6a19181a-7f55-4cd8-9dfb-23dd4c786428) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2459806910 From qamai at openjdk.org Wed Nov 6 17:39:41 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 6 Nov 2024 17:39:41 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 18 Oct 2024 05:35:28 GMT, Vladimir Ivanov wrote: >> This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ instruction for following IR pallets. >> >> >> MulVL ( AndV SRC1, 0xFFFFFFFF) ( AndV SRC2, 0xFFFFFFFF) >> MulVL (URShiftVL SRC1 , 32) (URShiftVL SRC2, 32) >> MulVL (URShiftVL SRC1 , 32) ( AndV SRC2, 0xFFFFFFFF) >> MulVL ( AndV SRC1, 0xFFFFFFFF) (URShiftVL SRC2 , 32) >> MulVL (VectorCastI2X SRC1) (VectorCastI2X SRC2) >> MulVL (RShiftVL SRC1 , 32) (RShiftVL SRC2, 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMUL[U]DQ instruction performs [unsigned] multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimizati... > > src/hotspot/share/opto/vectornode.cpp line 2122: > >> 2120: // MulL (URShift SRC1 , 32) (URShift SRC2, 32) >> 2121: // MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) >> 2122: // MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) > > I don't understand how it works... According to the documentation, `VPMULDQ`/`VPMULUDQ` consume vectors of double words and produce a vector of quadwords. But it looks like `SRC1`/`SRC2` are always vectors of longs (quadwords). And `vmuludq_reg` in `x86.ad` just takes the immedate operands and pass them into `vpmuludq` which doesn't look right... `vpmuludq` does a long multiplication but throws away the upper bits of the operands, effectively does a `(x & max_juint) * (y & max_juint)` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1805887594 From qamai at openjdk.org Wed Nov 6 17:39:42 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 6 Nov 2024 17:39:42 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 18 Oct 2024 05:37:16 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/vectornode.cpp line 2122: >> >>> 2120: // MulL (URShift SRC1 , 32) (URShift SRC2, 32) >>> 2121: // MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) >>> 2122: // MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) >> >> I don't understand how it works... According to the documentation, `VPMULDQ`/`VPMULUDQ` consume vectors of double words and produce a vector of quadwords. But it looks like `SRC1`/`SRC2` are always vectors of longs (quadwords). And `vmuludq_reg` in `x86.ad` just takes the immedate operands and pass them into `vpmuludq` which doesn't look right... > > `vpmuludq` does a long multiplication but throws away the upper bits of the operands, effectively does a `(x & max_juint) * (y & max_juint)` You can see its pseudocode here https://www.felixcloutier.com/x86/pmuludq VPMULUDQ (VEX.256 Encoded Version)[ ?](https://www.felixcloutier.com/x86/pmuludq#vpmuludq--vex-256-encoded-version-) DEST[63:0] := SRC1[31:0] * SRC2[31:0] DEST[127:64] := SRC1[95:64] * SRC2[95:64] DEST[191:128] := SRC1[159:128] * SRC2[159:128] DEST[255:192] := SRC1[223:192] * SRC2[223:192] DEST[MAXVL-1:256] := 0 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1805888984 From vlivanov at openjdk.org Wed Nov 6 17:39:42 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 6 Nov 2024 17:39:42 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 18 Oct 2024 05:39:08 GMT, Quan Anh Mai wrote: >> `vpmuludq` does a long multiplication but throws away the upper bits of the operands, effectively does a `(x & max_juint) * (y & max_juint)` > > You can see its pseudocode here https://www.felixcloutier.com/x86/pmuludq > > VPMULUDQ (VEX.256 Encoded Version)[ ?](https://www.felixcloutier.com/x86/pmuludq#vpmuludq--vex-256-encoded-version-) > DEST[63:0] := SRC1[31:0] * SRC2[31:0] > DEST[127:64] := SRC1[95:64] * SRC2[95:64] > DEST[191:128] := SRC1[159:128] * SRC2[159:128] > DEST[255:192] := SRC1[223:192] * SRC2[223:192] > DEST[MAXVL-1:256] := 0 Got it. Now it makes perfect sense. Thanks for the clarifications! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1805894106 From vlivanov at openjdk.org Wed Nov 6 17:39:43 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 6 Nov 2024 17:39:43 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 18 Oct 2024 05:46:25 GMT, Vladimir Ivanov wrote: >> You can see its pseudocode here https://www.felixcloutier.com/x86/pmuludq >> >> VPMULUDQ (VEX.256 Encoded Version)[ ?](https://www.felixcloutier.com/x86/pmuludq#vpmuludq--vex-256-encoded-version-) >> DEST[63:0] := SRC1[31:0] * SRC2[31:0] >> DEST[127:64] := SRC1[95:64] * SRC2[95:64] >> DEST[191:128] := SRC1[159:128] * SRC2[159:128] >> DEST[255:192] := SRC1[223:192] * SRC2[223:192] >> DEST[MAXVL-1:256] := 0 > > Got it. Now it makes perfect sense. Thanks for the clarifications! Actually, it makes detecting the pattern during matching even simpler than I initially thought. Since there's no need to match any non-trivial ideal IR tree, AD instruction can just match a single `MulVL`, but detect operand shapes using a predicate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1805903273 From aph at openjdk.org Wed Nov 6 17:55:35 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 6 Nov 2024 17:55:35 GMT Subject: Integrated: 8342540: InterfaceCalls micro-benchmark gives misleading results In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 11:53:06 GMT, Andrew Haley wrote: > `InterfaceCalls.java` makes highly predictable memory accesses, which leads to a gross time underestimate of the case where a megamorphic access is unpredictable. > > Here's one example, with and without randomization. The unpredictable megamorphic call takes more than 4* as long as the benchmark. > > > Benchmark (randomized) Mode Cnt Score Error Units > InterfaceCalls.test2ndInt3Types false avgt 4 5.013 ? 0.081 ns/op > InterfaceCalls.test2ndInt3Types true avgt 4 23.421 ? 0.102 ns/op > ``` > > This patch adds the "randomized" parameter, which allows the measurement of predictable and unpredictable megamorphic calls. This pull request has now been integrated. Changeset: 78b378ad Author: Andrew Haley URL: https://git.openjdk.org/jdk/commit/78b378ad03d0f6c85468ac208e84fabea79fc7de Stats: 34 lines in 1 file changed: 22 ins; 6 del; 6 mod 8342540: InterfaceCalls micro-benchmark gives misleading results Reviewed-by: shade, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21581 From kvn at openjdk.org Wed Nov 6 18:12:34 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 6 Nov 2024 18:12:34 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 10:19:51 GMT, theoweidmannoracle wrote: > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. Would be nice to have simple test for this. ------------- PR Review: https://git.openjdk.org/jdk/pull/21899#pullrequestreview-2419083858 From mli at openjdk.org Wed Nov 6 18:42:04 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 6 Nov 2024 18:42:04 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option [v2] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this simple patch? > Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. > Thanks Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: turn more verified extensions as DIAGNOSTIC ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21885/files - new: https://git.openjdk.org/jdk/pull/21885/files/4b41bb91..e5bd3eef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21885&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21885&range=00-01 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/21885.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21885/head:pull/21885 PR: https://git.openjdk.org/jdk/pull/21885 From mli at openjdk.org Wed Nov 6 18:42:04 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 6 Nov 2024 18:42:04 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 16:07:58 GMT, Robbin Ehn wrote: >> Hi, >> Can you help to review this simple patch? >> Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. >> Thanks > > Yes, I'm fine with that. Just so we try to keep somekind of common thread. @robehn Thanks for the confirmation, updated accordingly. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21885#issuecomment-2460511903 From aturbanov at openjdk.org Wed Nov 6 18:57:29 2024 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Wed, 6 Nov 2024 18:57:29 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand In-Reply-To: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: On Wed, 6 Nov 2024 12:19:47 GMT, Tobias Holenstein wrote: > color > > pick > > nodes > > Adds new option to IGV to color selected nodes: > 1) select some nodes > 2) `Ctrl + C` or `View` -> `Color action` > 3) pick a color and apply src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/DiagramScene.java line 226: > 224: > 225: public void colorSelectedFigures(Color color) { > 226: for (Figure figure : model.getSelectedFigures()) { Suggestion: for (Figure figure : model.getSelectedFigures()) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21925#discussion_r1831556933 From tholenstein at openjdk.org Wed Nov 6 20:30:26 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 6 Nov 2024 20:30:26 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v2] In-Reply-To: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: > color > > pick > > nodes > > Adds new option to IGV to color selected nodes: > 1) select some nodes > 2) `Ctrl + C` or `View` -> `Color action` > 3) pick a color and apply Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: Update src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/DiagramScene.java Co-authored-by: Andrey Turbanov ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21925/files - new: https://git.openjdk.org/jdk/pull/21925/files/0fd894fd..17205bab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21925.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21925/head:pull/21925 PR: https://git.openjdk.org/jdk/pull/21925 From dlong at openjdk.org Wed Nov 6 21:15:54 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 6 Nov 2024 21:15:54 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 08:06:33 GMT, Tobias Hartmann wrote: >> # Issue >> >> The `compiler/vectorapi/VectorLogicalOpIdentityTest.java` has been failing because C2 compiling the test `testAndMaskSameValue1` expects to have 1 `AndV` nodes but it has none. >> >> # Cause >> >> The issue has to do with the criteria that trigger a cleanup when performing late inlining. In the failing test, when the compiler tries to inline a `jdk.internal.vm.vector.VectorSupport::binaryOp` call, it fails because its argument is of the wrong type, mainly because some cast nodes ?hide? the more ?precise? type. >> The graph that leads to the issue looks like this: >> ![1BCE8148-1E44-4CA1-AF8F-EFC6210FA740](https://github.com/user-attachments/assets/62dd917f-2dac-42a9-90cf-73eedcd3cf8a) >> The compiler tries to inline `jdk.internal.vm.vector.VectorSupport::load` and it succeeds: >> ![752E81C9-A37D-4626-81A9-E4A839FADD3D](https://github.com/user-attachments/assets/e61057b2-3093-4992-ba5a-b80e4000c0ec) >> The node `3027 VectorBox` has type `IntMaxVector`. `912 CastPP` and `934 CheckCastPP` have type `IntVector`instead. >> The compiler then tries to inline one of the 2 `bynaryOp` calls but it fails because it needs an argument of type `IntMaxVector` and the argument it is given, which is node `934 CheckCastPP` , has type `IntVector`. >> >> This would not happen if between the 2 inlining attempts a _cleanup_ was triggered. IGVN would run and the 2 nodes `912 CastPP` and `934 CheckCastPP` would be folded away. `binaryOp` could then be inlined since the types would match. >> >> # Solution >> >> In order to fix this an extra cleanup has to be performed when we encounter a situation like the one above, i.e. when late inlining creates a `VectorBox`. >> >> Additional test runs with `-XX:-TieredCompilation` are added to `VectorLogicalOpIdentityTest.java` and `VectorGatherMaskFoldingTest.java` as regression tests and `-XX:+IncrementalInlineForceCleanup` is removed from `VectorGatherMaskFoldingTest.java` (previously added as workaround for this issue) > > src/hotspot/share/opto/callGenerator.cpp line 734: > >> 732: } >> 733: C->set_inlining_progress(true); >> 734: C->set_do_cleanup(kit.stopped() || result->Opcode() == Op_VectorBox); // path is dead or vector box; needs cleanup > > This only triggers if the return value of the incrementally inlined method is a `VectorBox`, right? Is that sufficient? Could the `VectorBox` be hidden by another node? I'm failing to understand why this is only an issue with VectorBox. It doesn't feel quite right to be checking for a specific node type here. Maybe this should be something like needs_cleanup(result)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21682#discussion_r1831715637 From vlivanov at openjdk.org Thu Nov 7 00:03:42 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 7 Nov 2024 00:03:42 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 13:31:37 GMT, Damon Fenacci wrote: > # Issue > > The `compiler/vectorapi/VectorLogicalOpIdentityTest.java` has been failing because C2 compiling the test `testAndMaskSameValue1` expects to have 1 `AndV` nodes but it has none. > > # Cause > > The issue has to do with the criteria that trigger a cleanup when performing late inlining. In the failing test, when the compiler tries to inline a `jdk.internal.vm.vector.VectorSupport::binaryOp` call, it fails because its argument is of the wrong type, mainly because some cast nodes ?hide? the more ?precise? type. > The graph that leads to the issue looks like this: > ![1BCE8148-1E44-4CA1-AF8F-EFC6210FA740](https://github.com/user-attachments/assets/62dd917f-2dac-42a9-90cf-73eedcd3cf8a) > The compiler tries to inline `jdk.internal.vm.vector.VectorSupport::load` and it succeeds: > ![752E81C9-A37D-4626-81A9-E4A839FADD3D](https://github.com/user-attachments/assets/e61057b2-3093-4992-ba5a-b80e4000c0ec) > The node `3027 VectorBox` has type `IntMaxVector`. `912 CastPP` and `934 CheckCastPP` have type `IntVector`instead. > The compiler then tries to inline one of the 2 `bynaryOp` calls but it fails because it needs an argument of type `IntMaxVector` and the argument it is given, which is node `934 CheckCastPP` , has type `IntVector`. > > This would not happen if between the 2 inlining attempts a _cleanup_ was triggered. IGVN would run and the 2 nodes `912 CastPP` and `934 CheckCastPP` would be folded away. `binaryOp` could then be inlined since the types would match. > > # Solution > > In order to fix this an extra cleanup has to be performed when we encounter a situation like the one above, i.e. when late inlining creates a `VectorBox`. > > Additional test runs with `-XX:-TieredCompilation` are added to `VectorLogicalOpIdentityTest.java` and `VectorGatherMaskFoldingTest.java` as regression tests and `-XX:+IncrementalInlineForceCleanup` is removed from `VectorGatherMaskFoldingTest.java` (previously added as workaround for this issue) The root cause of the bug is that type information obtained during inlining is not propagated until IGVN kicks in. Vector API is special here, because (1) it heavily relies on exact type information to perform intrinsification; and (2) vector intrinsics are processed during post-parse inlining. IMO the current fix (do cleanup when VectorBox is returned) is good enough as a stop-the-gap fix for Vector API issue (missed intrinsification opportunity). As an alternative fix, limited IGVN pass over `CastPP`/`CheckCastPP` users of result value may be enough to avoid full-blown cleanup. I suspect some other intrinsics may be susceptible to a similar issue, but in such case it would be more like a corner case (few intrinsics fail in rare conditions). A proper fix would be to re-examine failed intrinsics call site during IGVN and repeat intrinsifcation attempt when their inputs improve (akin to what is done in `CallStaticJavaNode::Ideal()`/`CallDynamicJavaNode::Ideal()`). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21682#issuecomment-2461038909 From vlivanov at openjdk.org Thu Nov 7 00:15:41 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 7 Nov 2024 00:15:41 GMT Subject: RFR: 8326369: Bimorphic inlining not applied at a call site that was initially monomorphic In-Reply-To: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> References: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> Message-ID: On Wed, 6 Nov 2024 09:06:47 GMT, Galder Zamarre?o wrote: > This issue was fixed by JDK-8339299 indirectly (see [PR](https://github.com/openjdk/jdk/pull/20786)), but the PR for 8339299 didn't include any new tests, so I'm sending this PR with the test that Filipp added in 8326369 so that such issue doesn't come back inadvertently. I've added him as co-author and applied some small formatting changes. Please, reframe JDK-8326369 as an Enhancement to add a missing test case. Otherwise, it looks confusing. It's also fine to create new issue and close JDK-8326369 as a duplicate of JDK-8339299. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21920#issuecomment-2461058157 From swen at openjdk.org Thu Nov 7 00:47:58 2024 From: swen at openjdk.org (Shaojin Wen) Date: Thu, 7 Nov 2024 00:47:58 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v18] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: <3ItvC90tZHf_VJuHevQPlS71roWQAn0kyaiAr1JBtf4=.8a631b7a-cf49-45cd-b9de-2c95e2340cc3@github.com> On Mon, 4 Nov 2024 11:48:49 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > more changes for Christian If it is not provided in the release image, users need to find the source code of the current version of JDK to build the fastdebug image to analyze whether the MergeStore optimization of a certain code works. I can understand that MergeStore may still need to be improved, so it cannot be used as a product feature, but this is a useful optimization and I hope it can be provided in the product eventually. I hope that TraceMergeStore can eventually be used in the release image like `PrintInlining` and become a tool for performance optimizers. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2461092539 From swen at openjdk.org Thu Nov 7 01:11:42 2024 From: swen at openjdk.org (Shaojin Wen) Date: Thu, 7 Nov 2024 01:11:42 GMT Subject: RFR: 8343629: More MergeStore benchmark In-Reply-To: References: Message-ID: <0lmMa_RLLb-r3FaHFLY2zIIPcht-6Y000LW9CIDYJUc=.afca63f6-5326-40c1-9200-e87a07080dc3@github.com> On Wed, 6 Nov 2024 07:18:16 GMT, Emanuel Peter wrote: > > ```java > > "null".getBytes(0, 4, bytes4, off); > > ``` > > > > > > > > > > > > > > > > > > > > > > > > Is it possible to do MergeStore in this scenario? > > I don't know. What do the logs say? And what does it currently compile down to, i.e. what assembly instructions? > > Otherwise I think this update seems reasonable. My thinking is this: StringBuilder buf = new StringBuilder(); // ... buf.append("null"); The calling path is as follows: AbstractStringBuilder::append -> AbstractStringBuilder::putStringAt -> String::getBytes(byte[], int, byte) -> System::arraycopy In this scenario, if System::arraycopy can be optimized to use putInt or putLong, performance can be improved. It is similar in the String concatenation scenario String f(int i) { return "abcd" + i; } Here `StringConcatHelper::prepend(int, byte, byte[], String, String)` is called, and then `String::getBytes(byte[], int, byte) -> System::arraycopy` package java.lang; class StringConcatHelper { static int prepend(int index, byte coder, byte[] buf, String value, String prefix) { index -= value.length(); if (coder == String.LATIN1) { value.getBytes(buf, index, String.LATIN1); index -= prefix.length(); prefix.getBytes(buf, index, String.LATIN1); } else { value.getBytes(buf, index, String.UTF16); index -= prefix.length(); prefix.getBytes(buf, index, String.UTF16); } return index; } } Here is similar to the above, can we optimize "abcd".getBytes to putInt or putLong? In summary, can we optimize the System::arraycopy of a stable byte[] with a length of 4 to putInt? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21659#issuecomment-2461114396 From fyang at openjdk.org Thu Nov 7 01:44:42 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 7 Nov 2024 01:44:42 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option [v2] In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 18:42:04 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this simple patch? >> Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. >> Thanks > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > turn more verified extensions as DIAGNOSTIC Please also update the JBS title to reflect the latest version, as we are targeting more options than a single UseZvfh. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21885#issuecomment-2461144551 From haosun at openjdk.org Thu Nov 7 01:44:49 2024 From: haosun at openjdk.org (Hao Sun) Date: Thu, 7 Nov 2024 01:44:49 GMT Subject: RFR: 8341834: C2 compilation fails with "bad AD file" due to Replicate [v3] In-Reply-To: References: Message-ID: <77bYQ44LNNQlSteh4rJSEvJ5QWIvpBcb7eNb0Sy-vVE=.a8c58c15-a7c9-4333-9a45-e49fc35797eb@github.com> On Mon, 4 Nov 2024 13:41:34 GMT, Roland Westrelin wrote: >> Nice, thanks for the added comments! >> >> Do you know what JDK versions are affected? > >> Do you know what JDK versions are affected? > > The failure doesn't reproduce with jdk21u. But that seems to be because we need JDK-8326139 (and JDK-8331575) for the bug to show up. Hi @rwestrel My JBS account is inactive recently. Hence I'd like to report the bug here. I encountered the following error with `-XX:MaxVectorSize=8` on both AArch64 and x86_64. Could you help take a look at this issue? Thanks. Test command: make test JTREG="VM_OPTIONS=-XX:MaxVectorSize=8" TEST=test/hotspot/jtreg/compiler/vectorization/TestReplicateAtConv.java Error message: CompileCommand: compileonly TestReplicateAtConv.test bool compileonly = true # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/tmp/jdk-dev/src/hotspot/share/opto/type.cpp:2499), pid=1424540, tid=1424557 # assert(Matcher::vector_size_supported(elem_bt, length)) failed: length in range # # JRE version: OpenJDK Runtime Environment (24.0) (fastdebug build 24-internal-git-63c19d3db58) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 24-internal-git-63c19d3db58, compiled mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) # Problematic frame: # V [libjvm.so+0x17bca30] TypeVect::make(BasicType, unsigned int, bool)+0x150 # # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /tmp/jdk-build/test-support/jtreg_test_hotspot_jtreg_compiler_vectorization_TestReplicateAtConv_java/scratch/0/core.1424540) # # An error report file with more information is saved as: # /tmp/jdk-build/test-support/jtreg_test_hotspot_jtreg_compiler_vectorization_TestReplicateAtConv_java/scratch/0/hs_err_pid1424540.log # # Compiler replay data is saved as: # /tmp/jdk-build/test-support/jtreg_test_hotspot_jtreg_compiler_vectorization_TestReplicateAtConv_java/scratch/0/replay_pid1424540.log # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp ------------- PR Comment: https://git.openjdk.org/jdk/pull/21660#issuecomment-2461145117 From amitkumar at openjdk.org Thu Nov 7 04:32:14 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 7 Nov 2024 04:32:14 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation Message-ID: Lazy computation of TypeFunc. Testing: Tier1 on Fastdebug & Release VMs (`s390x architecture`) ------------- Commit messages: - extra space - inline accessor methods - Revert "mac build workaround" - final change - mac build workaround - init change Changes: https://git.openjdk.org/jdk/pull/21782/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21782&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330851 Stats: 894 lines in 5 files changed: 619 ins; 31 del; 244 mod Patch: https://git.openjdk.org/jdk/pull/21782.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21782/head:pull/21782 PR: https://git.openjdk.org/jdk/pull/21782 From amitkumar at openjdk.org Thu Nov 7 04:32:14 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 7 Nov 2024 04:32:14 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation In-Reply-To: References: Message-ID: <1EYmbEDooBKIFVBWjsqwBwbQyipe_g3pqA30V-o3lOY=.af345b94-532e-42ef-8665-95969ddd3e4e@github.com> On Wed, 30 Oct 2024 06:43:23 GMT, Amit Kumar wrote: > Lazy computation of TypeFunc. > > Testing: Tier1 on Fastdebug & Release VMs (`s390x architecture`) @dean-long can you take a look at these changes "Pre-submit test" for zero are not related. [build.sh][INFO] Downloading https://archive.apache.org/dist/ant/binaries/apache-ant-1.10.8-bin.zip to /home/runner/work/jdk/jdk/jtreg/src/make/../build/deps/apache-ant-1.10.8-bin.zip Error: sh][ERROR] wget exited with exit code 4 Error: Process completed with exit code 1. Sorry for delay, I was out for long weekend. >For example, rename LockNode::lock_type() to LockNode::lock_type_init(), and have it save the result in a static const field. Then have LockNode::lock_type() simply return the field. But as you mentioned "the data field is `static const`". So we can't do assignment operation in the class itself. To do that we have to go outside the scope of class and do the definition part there. Or do you have another way in mind ? with that change I am getting this error: === Output from failing command(s) repeated here === * For target hotspot_variant-server_libjvm_objs_BUILD_LIBJVM_run_ld: Undefined symbols for architecture arm64: "LockNode::_lock_type_tf", referenced from: GraphKit::shared_lock(Node*) in graphKit.o LockNode::lock_type_init() in type.o ld: symbol(s) not found for architecture arm64 clang++: error: linker command failed with exit code 1 (use -v to see invocation) Here is shorter version: class Temp { public: static const int* ptr; public: static void set_ptr() { const int *abs = new int(20); ptr = abs; } }; // Initialize static member; const int* Temp::ptr = nullptr; int main() { Temp::set_ptr(); cout << *Temp::ptr << endl; return 0; } If I comment out `const int* Temp::ptr = nullptr;` then I am getting the similar error as I pasted above which I got from the build failure. Here we might need to give the definition out of scope of the class. Another solution is making the data-field inline: class Temp { public: static inline const int* ptr = nullptr; public: static void set_ptr() { const int *abs = new int(20); ptr = abs; } }; // Initialize static member; //const int* Temp::ptr = nullptr; int main() { Temp::set_ptr(); cout << *Temp::ptr << endl; return 0; } Here If we mark `ptr` as inline variable that is also acceptable, though C++17 started accepting it, but hotspot code is throwing warning over there as well. I don't see any way through which we can shrink the code here; Though methods with `*_Type` can be derived from macro because all of them are doing same task i.e. checking for assert & returning the field. But not sure that's a good choice. Because it will sprinkle the macro everywhere. Overall I think code will became less intuitive and more error prone. const TypeFunc *OptoRuntime::athrow_Type() { assert(_athrow_tf != nullptr, "should be initialized"); return _athrow_tf; } But if you want this or have another idea, I am happy to give it try. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2446193033 PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2446204918 PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2453901207 PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2456246788 PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2458710642 From dlong at openjdk.org Thu Nov 7 04:32:14 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 7 Nov 2024 04:32:14 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 06:43:23 GMT, Amit Kumar wrote: > Lazy computation of TypeFunc. > > Testing: Tier1 on Fastdebug & Release VMs (`s390x architecture`) It looks OK, but I think we are paying some overhead every time we try to get the TypeFunc, because C++ has to first check if it's the first time the function was called. Instead, how about getting rid of the lambda and make the initialization explicit?For example, rename LockNode::lock_type() to LockNode::lock_type_init(), and have it save the result in a static const field. Then have LockNode::lock_type() simply return the field. This is what I meant: diff --git a/src/hotspot/share/opto/callnode.hpp b/src/hotspot/share/opto/callnode.hpp index 2d3835b71ad..f72e78745b5 100644 --- a/src/hotspot/share/opto/callnode.hpp +++ b/src/hotspot/share/opto/callnode.hpp @@ -1190,9 +1190,11 @@ class AbstractLockNode: public CallNode { // 2 - a FastLockNode // class LockNode : public AbstractLockNode { + static const TypeFunc *_lock_type_tf; public: - static const TypeFunc *lock_type() { + static void lock_type_init() { + assert(_lock_type_tf == nullptr, "lock_type_init() already called"); // create input type (domain) const Type **fields = TypeTuple::fields(3); fields[TypeFunc::Parms+0] = TypeInstPtr::NOTNULL; // Object to be Locked @@ -1205,7 +1207,12 @@ class LockNode : public AbstractLockNode { const TypeTuple *range = TypeTuple::make(TypeFunc::Parms+0,fields); - return TypeFunc::make(domain,range); + _lock_type_tf = TypeFunc::make(domain,range); + } + + static const TypeFunc *lock_type() { + assert(_lock_type_tf != nullptr, "lock_type_init() not called"); + return _lock_type_tf; } virtual int Opcode() const; Nice work so far. I would suggest making _Type() accessors inlined, and try to reduce boiler-plate code with macros if possible (field name and accessor function name can both be derived from a common root, which is pretty common practice in HotSpot code). If you move all these accessor functions into the .hpp or .inline.hpp file, so they can be inlined, then I think the benefit of a macro will be come more apparent, but I won't insist. Let's see what other reviewers think. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2450827514 PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2455940131 PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2458525693 PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2461241654 From dlong at openjdk.org Thu Nov 7 04:32:14 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 7 Nov 2024 04:32:14 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation In-Reply-To: <1EYmbEDooBKIFVBWjsqwBwbQyipe_g3pqA30V-o3lOY=.af345b94-532e-42ef-8665-95969ddd3e4e@github.com> References: <1EYmbEDooBKIFVBWjsqwBwbQyipe_g3pqA30V-o3lOY=.af345b94-532e-42ef-8665-95969ddd3e4e@github.com> Message-ID: On Tue, 5 Nov 2024 05:08:33 GMT, Amit Kumar wrote: > Here we might need to give the definition out of scope of the class. Yes. For example, in callnode.cpp: const TypeFunc *LockNode::_lock_type_tf = nullptr; ------------- PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2456376861 From amitkumar at openjdk.org Thu Nov 7 04:32:14 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 7 Nov 2024 04:32:14 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation In-Reply-To: References: <1EYmbEDooBKIFVBWjsqwBwbQyipe_g3pqA30V-o3lOY=.af345b94-532e-42ef-8665-95969ddd3e4e@github.com> Message-ID: <4tYIvAd_-1u5crvXKVFKrzVeMMqw2284G--OFy2PzcU=.ec8d4a1e-6907-4f67-9acd-2739beffd60a@github.com> On Tue, 5 Nov 2024 06:53:46 GMT, Dean Long wrote: >> with that change I am getting this error: >> >> === Output from failing command(s) repeated here === >> * For target hotspot_variant-server_libjvm_objs_BUILD_LIBJVM_run_ld: >> Undefined symbols for architecture arm64: >> "LockNode::_lock_type_tf", referenced from: >> GraphKit::shared_lock(Node*) in graphKit.o >> LockNode::lock_type_init() in type.o >> ld: symbol(s) not found for architecture arm64 >> clang++: error: linker command failed with exit code 1 (use -v to see invocation) >> >> >> Here is shorter version: >> >> class Temp { >> public: >> static const int* ptr; >> >> public: >> static void set_ptr() { >> const int *abs = new int(20); >> ptr = abs; >> } >> }; >> >> // Initialize static member; >> const int* Temp::ptr = nullptr; >> >> int main() { >> Temp::set_ptr(); >> cout << *Temp::ptr << endl; >> return 0; >> } >> >> >> If I comment out `const int* Temp::ptr = nullptr;` then I am getting the similar error as I pasted above which I got from the build failure. Here we might need to give the definition out of scope of the class. >> >> >> Another solution is making the data-field inline: >> >> class Temp { >> public: >> static inline const int* ptr = nullptr; >> >> public: >> static void set_ptr() { >> const int *abs = new int(20); >> ptr = abs; >> } >> }; >> >> // Initialize static member; >> //const int* Temp::ptr = nullptr; >> >> int main() { >> Temp::set_ptr(); >> cout << *Temp::ptr << endl; >> return 0; >> } >> >> >> Here If we mark `ptr` as inline variable that is also acceptable, though C++17 started accepting it, but hotspot code is throwing warning over there as well. > >> Here we might need to give the definition out of scope of the class. > > Yes. For example, in callnode.cpp: > > const TypeFunc *LockNode::_lock_type_tf = nullptr; @dean-long I have updated the patch, please have a look at the current changes :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2457088617 From amitkumar at openjdk.org Thu Nov 7 04:32:14 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 7 Nov 2024 04:32:14 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 03:24:31 GMT, Dean Long wrote: >If you move all these accessor functions into the .hpp or .inline.hpp file, so they can be inlined, then I think the benefit of a macro will be come more apparent, but I won't insist. Let's see what other reviewers think. I have moved them, and marking changes ready for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2461294664 From amitkumar at openjdk.org Thu Nov 7 04:46:41 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 7 Nov 2024 04:46:41 GMT Subject: RFR: 8343506: [s390x] multiple test failures with ubsan In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 13:45:03 GMT, Martin Doerr wrote: > My point is that I think that the riscv solution is better. See assembler_riscv.inline.hpp. @TheRealMDoerr can we do it with another RFE ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21864#issuecomment-2461308296 From galder at openjdk.org Thu Nov 7 05:20:41 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 7 Nov 2024 05:20:41 GMT Subject: RFR: 8326369: Bimorphic inlining not applied at a call site that was initially monomorphic In-Reply-To: References: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> Message-ID: On Thu, 7 Nov 2024 00:13:00 GMT, Vladimir Ivanov wrote: >> This issue was fixed by JDK-8339299 indirectly (see [PR](https://github.com/openjdk/jdk/pull/20786)), but the PR for 8339299 didn't include any new tests, so I'm sending this PR with the test that Filipp added in 8326369 so that such issue doesn't come back inadvertently. I've added him as co-author and applied some small formatting changes. > > Please, reframe JDK-8326369 as an Enhancement to add a missing test case. Otherwise, it looks confusing. > > It's also fine to create new issue and close JDK-8326369 as a duplicate of JDK-8339299. @iwanowww I've reframed JDK-8326369 as per your suggestion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21920#issuecomment-2461339721 From amitkumar at openjdk.org Thu Nov 7 05:41:41 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 7 Nov 2024 05:41:41 GMT Subject: RFR: 8343724: [PPC64] Disallow OptoScheduling In-Reply-To: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> References: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> Message-ID: On Wed, 6 Nov 2024 16:18:11 GMT, Martin Doerr wrote: > Force off OptoScheduling for PPC64. It should be properly maintained before allowing users to switch it on. It's probably not important for this platform. Also see JBS issue for motivation. I see that test is passing for s390x (with ubsan enabled). But still do you think we should disable for s390x as well ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21935#issuecomment-2461368447 From epeter at openjdk.org Thu Nov 7 06:39:54 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 7 Nov 2024 06:39:54 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v18] In-Reply-To: <3ItvC90tZHf_VJuHevQPlS71roWQAn0kyaiAr1JBtf4=.8a631b7a-cf49-45cd-b9de-2c95e2340cc3@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> <3ItvC90tZHf_VJuHevQPlS71roWQAn0kyaiAr1JBtf4=.8a631b7a-cf49-45cd-b9de-2c95e2340cc3@github.com> Message-ID: <5OKKjGShXckCMGeNBZNwzgfI-1X5NyD4mRrzPQy2jEk=.effcbecf-713b-4161-8a73-5e579c4ae685@github.com> On Thu, 7 Nov 2024 00:45:22 GMT, Shaojin Wen wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more changes for Christian > > If it is not provided in the release image, users need to find the source code of the current version of JDK to build the fastdebug image to analyze whether the MergeStore optimization of a certain code works. > > I can understand that MergeStore may still need to be improved, so it cannot be used as a product feature, but this is a useful optimization and I hope it can be provided in the product eventually. > > I hope that TraceMergeStore can eventually be used in the release image like `PrintInlining` and become a tool for performance optimizers. @wenshao I suppose we could consider making `TraceMergeStores` and `TraceAutoVectorization` available in product, but under the `-XX:+UnlockDiagnosticVMOptions` flag... I will discuss this with other VM engineers. That means it is available, but there is no promise of stability. Still, once people become dependent on it, maybe even tools become dependent, then it is harder to make changes without everybody complaining ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2461437763 From epeter at openjdk.org Thu Nov 7 06:56:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 7 Nov 2024 06:56:15 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures Message-ID: **History** This issue became apparent with https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155): On machines that do not support sha intrinsics, we execute the sha code in java code. This java code has a loop that previously did not vectorize, but it now does since https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). It turns out that that kind of loop is actually slower when vectorized - this led to a regression, reported originally as: `8334431: Regression 18-20% on Mac x64 on Crypto.signverify` I then investigated the issue thoroughly, and discovered that it was even an issue before https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I wrote a [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html) about the issue. **Summary of Problem** As described in the [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html), vectorization can introduce store-to-load failures that were not present in the scalar loop code. Where in scalar code, the loads and stores were all exactly overlapping or non-overlapping, in vectorized code they can now be partially overlapping. When a store and a later load are partially overlapping, the store value cannot be directly forwarded from the store-buffer to the load (would be fast), but has to first go to L1 cache. This incurs a higher latency on the dependency edge from the store to the load. **Benchmark** I introduced a new micro-benchmark in https://github.com/openjdk/jdk/pull/19880, and now further expanded it in this PR. You can see the extensive results in [this comment below](https://github.com/openjdk/jdk/pull/21521#issuecomment-2458938698). The benchmarks look different on different machines, but they all have a pattern similar to this: ![image](https://github.com/user-attachments/assets/3366f7fa-af44-44d4-a476-8cd0466fe937) ![image](https://github.com/user-attachments/assets/1c1408c2-053e-4a8a-ad46-32b75b836161) ![image](https://github.com/user-attachments/assets/d392c8cf-fb62-4593-93c7-a0d85ad5885e) ![image](https://github.com/user-attachments/assets/3a79601f-4015-4f71-a510-cab7d7b59ed8) We see that the `scalar` loop is faster for low `offset`, and the `vectorized` loop is faster for high offsets (and power-of-w offsets). The reason is that for low offsets, the latency dominates the runtime, and for high offsets the throughput dominates. If there are store-to-load-failures from every iteration `i` -> `i+offset`, and we have a total of `n` iterations, then we have a chain of `n/offset` latencies. Hence, as the `offset` increases, this latency chain becomes smaller and smaller. As an example: `offset = 3`, the 3rd iteration depends on the 0th, the 6th on the 3rd, the 9th on the 6th, the 12th on the 9th ... all the way to the nth iteration. **Current Solution: a new heuristic** Any heuristic is going to be somewhat inaccurate, but we now want to fix this issue in JDK24, and so I'd rather have a quick solution that works most of the time, rather than a sophisticated solution that works almost always. The sophisticated solution would carefully compute the expected latency and throughput for both the scalar and vectorized loop, and pick the faster one. I hope to experiment with that in the future. For now, we just implement a "hard cutoff": if we predict that there will be ANY store-to-load-forwarding failure within some `N` iterations, then we bailout of vectoirzation. This `N` can be configured with the new diagnostic flag `SuperWordStoreToLoadForwardingFailureDetection`. The benchmarks indicated that `x64` machines should have a value of `16`, and `aarch64 asimd/neon` machines a value of `8`. I do not know what the value should be on other machines ... I just guessed it to be `16`, but **platform maintainers are welcome to adjust this value** - my benchmarks may be a helpful guide. Note: we only detect store-to-load-forwarding failures when the loads and stores are known at compile time to go to the same memory object. **Should someone experience performance regressions doe to this fix**: you can disable the detection by setting the diagnostic flag: `-XX:+UnlockDiagnosticVMOptions -XX:SuperWordStoreToLoadForwardingFailureDetection=0`. Maybe you just need to lower it from the default. Increasing it is probably not going to help - but why not try anyway. **Tests** I had to adapt some tests. Primarily `TestDependencyOffsets.java`, which I just refactored in https://github.com/openjdk/jdk/pull/21541 to make these changes here easier. **Performance Testing** [I ran my benchmark on 7 machines](https://github.com/openjdk/jdk/pull/21521#issuecomment-2458938698), and the new heuristic seems to perform very well. I also ran extensive performance testing, and I did not see any significant change. This was the originally reported regression (MacOSX x64 - SPECjvm2008-Crypto.signverify-G1: specjvm2008): ![image](https://github.com/user-attachments/assets/ed68344a-c4aa-47b7-96a1-60c91faee503) (drop from `promo-24-b1` to `promo-24-b2`) And that seems to be fixed now: ![image](https://github.com/user-attachments/assets/394f7a44-5fb5-4217-bf0e-f5a585268f2a) ------------- Commit messages: - Merge branch 'master' into JDK-8334431-V-store-to-load-forwarding - Merge branch 'master' into JDK-8334431-V-store-to-load-forwarding - fix whitespace - fix tests and build - fix store-to-load forward IR rules - updates before the weekend ... who knows if they are any good - refactor to iteration threshold - use jvmArgs again, and apply same fix as 8343345 - revert to jvmArgsPrepend - manual merge - ... and 14 more: https://git.openjdk.org/jdk/compare/06d8216a...9b2efe1a Changes: https://git.openjdk.org/jdk/pull/21521/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21521&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334431 Stats: 4386 lines in 17 files changed: 4324 ins; 4 del; 58 mod Patch: https://git.openjdk.org/jdk/pull/21521.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21521/head:pull/21521 PR: https://git.openjdk.org/jdk/pull/21521 From epeter at openjdk.org Thu Nov 7 06:56:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 7 Nov 2024 06:56:15 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures In-Reply-To: References: Message-ID: On Tue, 15 Oct 2024 11:33:04 GMT, Emanuel Peter wrote: > **History** > This issue became apparent with https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155): > On machines that do not support sha intrinsics, we execute the sha code in java code. This java code has a loop that previously did not vectorize, but it now does since https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). It turns out that that kind of loop is actually slower when vectorized - this led to a regression, reported originally as: > `8334431: Regression 18-20% on Mac x64 on Crypto.signverify` > > I then investigated the issue thoroughly, and discovered that it was even an issue before https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I wrote a [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html) about the issue. > > **Summary of Problem** > > As described in the [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html), vectorization can introduce store-to-load failures that were not present in the scalar loop code. Where in scalar code, the loads and stores were all exactly overlapping or non-overlapping, in vectorized code they can now be partially overlapping. When a store and a later load are partially overlapping, the store value cannot be directly forwarded from the store-buffer to the load (would be fast), but has to first go to L1 cache. This incurs a higher latency on the dependency edge from the store to the load. > > **Benchmark** > > I introduced a new micro-benchmark in https://github.com/openjdk/jdk/pull/19880, and now further expanded it in this PR. You can see the extensive results in [this comment below](https://github.com/openjdk/jdk/pull/21521#issuecomment-2458938698). > > The benchmarks look different on different machines, but they all have a pattern similar to this: > ![image](https://github.com/user-attachments/assets/3366f7fa-af44-44d4-a476-8cd0466fe937) > ![image](https://github.com/user-attachments/assets/1c1408c2-053e-4a8a-ad46-32b75b836161) > ![image](https://github.com/user-attachments/assets/d392c8cf-fb62-4593-93c7-a0d85ad5885e) > ![image](https://github.com/user-attachments/assets/3a79601f-4015-4f71-a510-cab7d7b59ed8) > > We see that the `scalar` loop is faster for low `offset`, and the `vectorized` loop is faster for high offsets (and power-of-w offsets). > > The reason is that for low offsets, th... I'm experimenting now. I have taken my benchmark from https://github.com/openjdk/jdk/pull/19880, and extended it a little. Here the full results in the [PDF](https://github.com/user-attachments/files/17518027/table2.pdf). And here some charts: ![image](https://github.com/user-attachments/assets/3366f7fa-af44-44d4-a476-8cd0466fe937) ![image](https://github.com/user-attachments/assets/1c1408c2-053e-4a8a-ad46-32b75b836161) ![image](https://github.com/user-attachments/assets/d392c8cf-fb62-4593-93c7-a0d85ad5885e) ![image](https://github.com/user-attachments/assets/3a79601f-4015-4f71-a510-cab7d7b59ed8) I ran this on my avx521 machine, so results may vary on different platforms - especially with different vector-lengths and different store-to-load-forwarding mechanisms. But on my machine, it is pretty clear that the cut-off is at about an offset of 32. I explain it like this: with an offset smaller than 32, the latency is the main issue: the store-to-load-forwarding failures incur a higher latency on that store-load "edge", and that shows in the final runtime. But if the offset is larger than 32, then we have limitation on throughput: we can only run so many scalar ops per cycle. But if we turn them into vector ops, we have fewer ops, and so we are faster. Of course, optimally we would have some sort of cost model that takes into account both latency and throughput. But that is for Future Work. For now, we need some cut-off heuristic. And it looks like - at least for avx512 - the heuristic is that we must check if there is any store-to-load-forwarding failure within 32 (virtually unrolled?) iterations. Of course this will not be fully accurate - hand-unrolling and a number of other factors can confuse this heuristic. Now I also ran it on a ASIMD aarch64 machine. Here the [PDF](https://github.com/user-attachments/files/17522538/table_aarch64.pdf). ![image](https://github.com/user-attachments/assets/ccaa73b0-1659-4ead-873c-39cc7c9b4e53) ![image](https://github.com/user-attachments/assets/a662e555-66f2-43ac-9d43-66e2a739a108) ![image](https://github.com/user-attachments/assets/d8dd329e-19e7-4e3b-8e25-b8d69aacc2ac) ![image](https://github.com/user-attachments/assets/9cee426b-46a3-4359-a547-d20d8d03dbce) A few observations. - The short benchmark is a bit noisy. Maybe someone else got on to that machine while I was running the benchmark. But everything else looks quite clean and nice, so I won't re-run the benchmark again now. - In all 4 plots, we see a similar pattern: with smaller than offset 8, there is some few instances where vectorization is slower, but with higher offset, vectorization seems to always pay off. - We see a similar "stepping" pattern with `byte`, a little with `short`, and not much at all with `int` and `long`. - The vector size on that machine is only 16 bytes, so the throughput difference on the `long` benchmark can only be a factor 2x, with `int` 4x, with `short` 8x and with `byte` 16x. That seems to roughly show true with high offsets. If I had to come up with a hypothesis, I would say that the cut-off is at `X` iterations, where `X = MaxVectorSize / 2`. I need to confirm that with different machines, maybe AVX2. And now I also ran it on an `AVX2` machine (Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz). Here the [PDF](https://github.com/user-attachments/files/17586850/ghost11_avx2_table_v2.pdf). ![image](https://github.com/user-attachments/assets/bacb12ca-dbb3-4ae7-abab-f0690a93f6b1) ![image](https://github.com/user-attachments/assets/f1e321d6-4589-4c4a-9640-755146fe0961) ![image](https://github.com/user-attachments/assets/73ef7756-821c-4142-8ad3-faa6121b78a5) ![image](https://github.com/user-attachments/assets/4defda17-4408-4750-be50-0b3f31f69e44) It looks like the cut-off is consistently at 16 iterations, though 32 iterations would be fine as well. **Some Thoughts** Every hardware will behave different. It depends on latency and throughput. The latency depends on the L1 cache latency, especially for the store-to-load-forwarding failures. And the throughput depends on the vector length, and the number of ports that can execute the instructions. This is quite complex, and would require fine-tuning. For now, I will just have to set a hard limit, which is going to be inaccurate. But it is probably better on average than doing nothing for now. Now I'm trying to consider how to set the iteration threshold. The benchmark here is a very simple case, and it is (to my understanding) maximally sensitive to store-to-load-forwarding failure latency: we only perform load, add and store. If the loop contained more other instructions that could be parallelized, then we would be more quickly limited by throughput. Hence, vectorization would be profitable earlier, i.e. for lower iteration thresholds. I would therefore wager that it is better to err on the lower side, and set the iteration threshold lower than the `StoreToLoadForwarding` benchmark indicates. Thus, I will set the iteration threshold at `16` for `x86` (and by default for all platforms), and at `8` for `aarch64`. Of course the iteration threshold is in the current implementation only a lower bound, we cannot at this point avoid having more iterations than the threshold in the unrolled loop at the time we vectorize. If there are more iterations in the loop, the threshold is effectively higher. I've run benchmarks on 7 machines now. Here my [micro.ods](https://github.com/user-attachments/files/17643625/micro.ods). scalar: SuperWord disabled no_detect: SuperWord without detecting store-to-load-forwarding failures (old behaviour, before this patch) default: new default SuperWord behaviour (detect store-to-load-forwarding failures for small offsets -> disable vectorization if detected) **Conclusion** For this benchmark, it seems the new behaviour (`default`) is very accurately chosing the best options between `scalar` and `no_detect`. The only exception is on the `windows x64` machine: for ints and longs in the offset range from 17-31 `default` decides to vectorize (same as `no_detect`), where the `scalar` option would have been a little faster. But no heuristic is perfect, and as said above: this benchmark is maximally sensitive to latency, and if the amount of work per iteration was increased, then we should expect the balance to tip towards vectorization being preferrable. x64 AVX2 machine ![image](https://github.com/user-attachments/assets/c234f922-aab9-4c1d-bade-9d9ce1363372) OCI aach64 (asimd / neon) ![image](https://github.com/user-attachments/assets/cd1e6bf2-a5c5-42d7-9690-c13a0474313b) Linux aarch64 (asimd / neon) ![image](https://github.com/user-attachments/assets/488e0c12-9f2a-4601-8f06-d976531b9d7e) Linux x64 ![image](https://github.com/user-attachments/assets/24d04e84-7206-4ce6-9576-41c3090ba326) MacOSX aarch64 (asimd / neon) ![image](https://github.com/user-attachments/assets/c33fa869-3a06-4fff-af9c-fcb794e2f1a1) MacOSX x64 ![image](https://github.com/user-attachments/assets/93f16e92-f2e6-44b9-a0f4-2223ef7c619a) Windows x64 ![image](https://github.com/user-attachments/assets/79221029-9733-4488-ab27-654674b44c03) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21521#issuecomment-2437135550 PR Comment: https://git.openjdk.org/jdk/pull/21521#issuecomment-2437764165 PR Comment: https://git.openjdk.org/jdk/pull/21521#issuecomment-2449602737 PR Comment: https://git.openjdk.org/jdk/pull/21521#issuecomment-2449611215 PR Comment: https://git.openjdk.org/jdk/pull/21521#issuecomment-2451423352 PR Comment: https://git.openjdk.org/jdk/pull/21521#issuecomment-2458938698 From chagedorn at openjdk.org Thu Nov 7 07:07:45 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 7 Nov 2024 07:07:45 GMT Subject: RFR: 8342945: Replace predicate walking code in get_assertion_predicates() used for Loop Unswitching and cleaning useless Template Assertion Predicates with a predicate visitor In-Reply-To: <5rOjn2UG-Qg9QFlM4wjWWFZ2X7TicU25nS4VSI1NAu8=.f73abba2-8e1e-4494-80e0-5e092763ecec@github.com> References: <5rOjn2UG-Qg9QFlM4wjWWFZ2X7TicU25nS4VSI1NAu8=.f73abba2-8e1e-4494-80e0-5e092763ecec@github.com> Message-ID: On Wed, 6 Nov 2024 07:00:57 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and Post Loop (integrated with https://github.com/openjdk/jdk/pull/21790) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): `PhaseIdealLoop::get_assertion_predicates()` used for Loop Unswitching and removing useless Template Assertion Predicate (this PR) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (upcoming) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679 / https://github.com/openjdk/jdk/pull/21790)) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. This limitation should eventually be removed. But I want to do that separately at a later point. > ---... Thanks Vladimir for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21918#issuecomment-2461470724 From chagedorn at openjdk.org Thu Nov 7 07:07:46 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 7 Nov 2024 07:07:46 GMT Subject: Integrated: 8342945: Replace predicate walking code in get_assertion_predicates() used for Loop Unswitching and cleaning useless Template Assertion Predicates with a predicate visitor In-Reply-To: <5rOjn2UG-Qg9QFlM4wjWWFZ2X7TicU25nS4VSI1NAu8=.f73abba2-8e1e-4494-80e0-5e092763ecec@github.com> References: <5rOjn2UG-Qg9QFlM4wjWWFZ2X7TicU25nS4VSI1NAu8=.f73abba2-8e1e-4494-80e0-5e092763ecec@github.com> Message-ID: On Wed, 6 Nov 2024 07:00:57 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and Post Loop (integrated with https://github.com/openjdk/jdk/pull/21790) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): `PhaseIdealLoop::get_assertion_predicates()` used for Loop Unswitching and removing useless Template Assertion Predicate (this PR) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (upcoming) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679 / https://github.com/openjdk/jdk/pull/21790)) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. This limitation should eventually be removed. But I want to do that separately at a later point. > ---... This pull request has now been integrated. Changeset: a6c85daa Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/a6c85daa1c5e685ab64cbf9860a022aaa4a0d7f8 Stats: 59 lines in 4 files changed: 22 ins; 22 del; 15 mod 8342945: Replace predicate walking code in get_assertion_predicates() used for Loop Unswitching and cleaning useless Template Assertion Predicates with a predicate visitor Reviewed-by: thartmann, roland, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21918 From duke at openjdk.org Thu Nov 7 07:31:47 2024 From: duke at openjdk.org (duke) Date: Thu, 7 Nov 2024 07:31:47 GMT Subject: RFR: 8323803: ConstantOopReadValue::print_on should print 'null' instead of 'nullptr' In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 09:23:16 GMT, theoweidmannoracle wrote: > Printing incorrectly printed `nullptr` instead of `null` > > Buggy: > > > ScopeDesc(pc=0x0000000104c05468 offset=2e8): > java.lang.Class::desiredAssertionStatus at 20 (line 3984) > Locals > - l0: reg rfp [58],oop > - l1: stack[0],oop > - l2: nullptr > - l3: empty > Expression stack > - @0: nullptr > > > Fixed: > > > ScopeDesc(pc=0x0000000106fdd468 offset=2e8): > java.lang.Class::desiredAssertionStatus at 20 (line 3984) > Locals > - l0: reg rfp [58],oop > - l1: stack[0],oop > - l2: null > - l3: empty > Expression stack > - @0: null @theoweidmannoracle Your change (at version 0fa7e4e52dcebcd0694afae77908a50101e820da) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21869#issuecomment-2461506657 From epeter at openjdk.org Thu Nov 7 07:51:44 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 7 Nov 2024 07:51:44 GMT Subject: RFR: 8343479: Remove unnecessary @SuppressWarnings annotations (hotspot) [v2] In-Reply-To: References: Message-ID: <3xJg8mwE5kmAA_DfVquqRuI9nbrHHTfv-kdePt_LF5E=.79702bef-f612-4914-b3ee-03a6c0ea306f@github.com> On Sun, 3 Nov 2024 03:10:24 GMT, Archie Cobbs wrote: >> Please review this patch which removes unnecessary `@SuppressWarnings` annotations. > > Archie Cobbs has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Update copyright years. > - Merge branch 'master' into SuppressWarningsCleanup-hotspot > - Merge branch 'master' into SuppressWarningsCleanup-graal > - Remove unnecessary @SuppressWarnings annotations. Hi @archiecobbs can you please give some more info about why these were introduced, and why they are now not needed any more? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21853#issuecomment-2461538717 From epeter at openjdk.org Thu Nov 7 07:55:43 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 7 Nov 2024 07:55:43 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 16:04:38 GMT, theoweidmannoracle wrote: > This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: > > > ConINode* node = _igvn.intcon(i); > set_ctrl(node, C->root()); > > > and > > > ConLNode* node = _igvn.longcon(i); > set_ctrl(node, C->root()); > > > Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. src/hotspot/share/opto/loopTransform.cpp line 2963: > 2961: // Kill the eliminated test > 2962: C->set_major_progress(); > 2963: Node *kill_con = intcon(1-flip); Suggestion: Node* kill_con = intcon(1-flip); We generally now have the pointer `*` with the type. So if you touch any new code please update it ;) src/hotspot/share/opto/loopopts.cpp line 334: > 332: } > 333: // 'con' is set to true or false to kill the dominated test. > 334: Node *con = makecon(pop == Op_IfTrue ? TypeInt::ONE : TypeInt::ZERO); Suggestion: Node* con = makecon(pop == Op_IfTrue ? TypeInt::ONE : TypeInt::ZERO); src/hotspot/share/opto/loopopts.cpp line 2907: > 2905: int proj_con = live_proj->_con; > 2906: assert(proj_con == 0 || proj_con == 1, "false or true projection"); > 2907: Node *con = intcon(proj_con); Suggestion: Node* con = intcon(proj_con); src/hotspot/share/opto/loopopts.cpp line 3245: > 3243: stay_in_loop(lp_proj, loop)->is_If() && > 3244: stay_in_loop(lp_proj, loop)->in(1)->in(1)->Opcode() == Op_CmpU, "inserted cmpi before cmpu"); > 3245: Node *con = makecon(lp_proj->is_IfTrue() ? TypeInt::ONE : TypeInt::ZERO); Suggestion: Node* con = makecon(lp_proj->is_IfTrue() ? TypeInt::ONE : TypeInt::ZERO); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1832203491 PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1832204711 PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1832205017 PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1832205177 From duke at openjdk.org Thu Nov 7 08:12:42 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 7 Nov 2024 08:12:42 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 18:20:58 GMT, Vladimir Kozlov wrote: > Do we have other places (not new constant node) where we set Root as control? May be we can add `set_root_as_ctrl(n)` method in `loop node.hpp` in such case. There's only three locations where control is set to the root in the loop files now (not counting the ones in the new methods I added). The main reason for this patch is bugs caused by people forgetting to set control for constants (e.g. https://bugs.openjdk.org/browse/JDK-8343137), which is now prevented if the new helper methods are used. Do you think there would be any benefit from introducing `set_root_as_ctrl(n)` given there's only about three places where this pattern occurs now? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21836#issuecomment-2461576384 From roland at openjdk.org Thu Nov 7 08:25:50 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 7 Nov 2024 08:25:50 GMT Subject: RFR: 8341834: C2 compilation fails with "bad AD file" due to Replicate [v3] In-Reply-To: <77bYQ44LNNQlSteh4rJSEvJ5QWIvpBcb7eNb0Sy-vVE=.a8c58c15-a7c9-4333-9a45-e49fc35797eb@github.com> References: <77bYQ44LNNQlSteh4rJSEvJ5QWIvpBcb7eNb0Sy-vVE=.a8c58c15-a7c9-4333-9a45-e49fc35797eb@github.com> Message-ID: On Thu, 7 Nov 2024 01:42:30 GMT, Hao Sun wrote: >>> Do you know what JDK versions are affected? >> >> The failure doesn't reproduce with jdk21u. But that seems to be because we need JDK-8326139 (and JDK-8331575) for the bug to show up. > > Hi @rwestrel > > My JBS account is inactive recently. Hence I'd like to report the bug here. > > I encountered the following error with `-XX:MaxVectorSize=8` on both AArch64 and x86_64. > Could you help take a look at this issue? Thanks. > > Test command: > > make test JTREG="VM_OPTIONS=-XX:MaxVectorSize=8" TEST=test/hotspot/jtreg/compiler/vectorization/TestReplicateAtConv.java > > > Error message: > > CompileCommand: compileonly TestReplicateAtConv.test bool compileonly = true > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/tmp/jdk-dev/src/hotspot/share/opto/type.cpp:2499), pid=1424540, tid=1424557 > # assert(Matcher::vector_size_supported(elem_bt, length)) failed: length in range > # > # JRE version: OpenJDK Runtime Environment (24.0) (fastdebug build 24-internal-git-63c19d3db58) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 24-internal-git-63c19d3db58, compiled mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) > # Problematic frame: > # V [libjvm.so+0x17bca30] TypeVect::make(BasicType, unsigned int, bool)+0x150 > # > # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /tmp/jdk-build/test-support/jtreg_test_hotspot_jtreg_compiler_vectorization_TestReplicateAtConv_java/scratch/0/core.1424540) > # > # An error report file with more information is saved as: > # /tmp/jdk-build/test-support/jtreg_test_hotspot_jtreg_compiler_vectorization_TestReplicateAtConv_java/scratch/0/hs_err_pid1424540.log > # > # Compiler replay data is saved as: > # /tmp/jdk-build/test-support/jtreg_test_hotspot_jtreg_compiler_vectorization_TestReplicateAtConv_java/scratch/0/replay_pid1424540.log > # > # If you would like to submit a bug report, please visit: > # https://bugreport.java.com/bugreport/crash.jsp @shqking thanks for the report. I filed https://bugs.openjdk.org/browse/JDK-8343747 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21660#issuecomment-2461598820 From rrich at openjdk.org Thu Nov 7 08:55:45 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 7 Nov 2024 08:55:45 GMT Subject: RFR: 8343724: [PPC64] Disallow OptoScheduling In-Reply-To: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> References: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> Message-ID: On Wed, 6 Nov 2024 16:18:11 GMT, Martin Doerr wrote: > Force off OptoScheduling for PPC64. It should be properly maintained before allowing users to switch it on. It's probably not important for this platform. Also see JBS issue for motivation. src/hotspot/cpu/ppc/c2_init_ppc.cpp line 58: > 56: warning("OptoScheduling is not supported on this CPU."); > 57: FLAG_SET_DEFAULT(OptoScheduling, false); > 58: } Makes sense but better do it in `VM_Version::initialize()` because `Compile::pd_compiler2_init()` is called after initialization of flags has been completed and the setting will not be shown with `PrintFlagsFinal`. I'd even suggest to move the other flag settings there with this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21935#discussion_r1832283840 From tholenstein at openjdk.org Thu Nov 7 08:58:46 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 7 Nov 2024 08:58:46 GMT Subject: RFR: 8321997: Increase upper limit of LoopOptsCount flag In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 09:13:12 GMT, Tobias Holenstein wrote: > Currently `LoopOptsCount` has a range of 5-43 with default value 43. For stress testing we want to set values higher than 43. Set to upper limit to 1000 or even max_jint. I'm not really sure why 43 was chosen as the default. With this PR, we can experiment with higher values and potentially adjust the default in the future. >From my own tests, I have rarely seen the 43 limit hit, but I have observed a few edge cases where loop optimization were applied in the hundreds (after removing the 43 limit). We would need to look into those cases more closely to see if they actually improve performance or if they might even reveal issues in the loop optimizations. Thanks for the reviews @shipilev and @chhagedorn ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21921#issuecomment-2461662396 From tholenstein at openjdk.org Thu Nov 7 08:58:46 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 7 Nov 2024 08:58:46 GMT Subject: Integrated: 8321997: Increase upper limit of LoopOptsCount flag In-Reply-To: References: Message-ID: <-qLB2HgbRaNRFNIFN-oie0Nd2mAh-Hc4pKMF1Ub2te4=.efeb241c-f8a0-4185-9fdf-f5464910adac@github.com> On Wed, 6 Nov 2024 09:13:12 GMT, Tobias Holenstein wrote: > Currently `LoopOptsCount` has a range of 5-43 with default value 43. For stress testing we want to set values higher than 43. Set to upper limit to 1000 or even max_jint. This pull request has now been integrated. Changeset: 592a48b1 Author: Tobias Holenstein URL: https://git.openjdk.org/jdk/commit/592a48b163ed582872b686e7a606cf8b96fcbcbc Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8321997: Increase upper limit of LoopOptsCount flag Reviewed-by: shade, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/21921 From chagedorn at openjdk.org Thu Nov 7 09:37:03 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 7 Nov 2024 09:37:03 GMT Subject: RFR: 8342946: Replace predicate walking code in Loop Unrolling with a predicate visitor Message-ID: #### Replacing the Remaining Predicate Walking and Cloning Code The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and Post Loop (integrated with https://github.com/openjdk/jdk/pull/21790) - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): `PhaseIdealLoop::get_assertion_predicates()` used for Loop Unswitching and removing useless Template Assertion Predicate (integrated with https://github.com/openjdk/jdk/pull/21918) - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (this PR) --- (Sections taken over from https://github.com/openjdk/jdk/pull/21679 / https://github.com/openjdk/jdk/pull/21790 / https://github.com/openjdk/jdk/pull/21918) #### Single Template Assertion Predicate Check This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). #### Common Refactorings for all the Patches in this Series In each of the patch, I will do similar refactoring ideas: - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. This limitation should eventually be removed. But I want to do that separately at a later point. --- #### Refactorings of this Patch This patch replaces the predicate walking in `PhaseIdealLoop::update_main_loop_assertion_predicates()` which is used during Loop Unrolling to update the Template Assertion Predicates for the new unrolled stride and create new Initialized Assertion Predicates reflecting that change while the old Initialized Assertion Predicates with the pre-unrolled stride are killled. - New visitor `UpdateStrideForAssertionPredicates` takes care of these tasks. - Update Template Assertion Predicates: `replace_opaque_stride_input()` - Uses new class `ReplaceOpaqueStrideInput` which does a simple BFS on a Template Assertion Expression to find the `OpaqueLoopStrideNode` to update it. Note that the existing class `DataNodesOnPathsToTargets` is not suitable since this class collects all nodes in between which is unnecessary for this task. - Create Initialized Assertion Predicate from template: `initialize_from_updated_template()` - Calls `clone_and_fold_opaque_loop_nodes()` that uses new strategy class `RemoveOpaqueLoopNodesStrategy` which is passed to the existing method `TemplateAssertionExpression::clone()` to do the Template Assertion Expression cloning. This strategy just folds the `OpaqueLoop*nodes` away for the cloned expression and only keeps their inputs. #### Follow-up Work In Loop Unrolling, we only update the stride and not the init value. Thus, we actually only require to update the Last Value Assertion Predicates because the Init Value Assertion Predicates do not use `OpaqueLoopStride`. So, we also would not be required to kill the old Init Value Initialized Assertion Predicates. This was already an inefficiency before but could now be tackled since we keep track of whether an Assertion Predicate is for the init or last value with `AssertionPredicateType`. I filed [JDK-8343745](https://bugs.openjdk.org/browse/JDK-8343745) for that. Thanks, Christian ------------- Commit messages: - Add const - 8342946: Replace predicate walking code in Loop Unrolling with a predicate visitor Changes: https://git.openjdk.org/jdk/pull/21944/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21944&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342946 Stats: 203 lines in 4 files changed: 161 ins; 29 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/21944.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21944/head:pull/21944 PR: https://git.openjdk.org/jdk/pull/21944 From duke at openjdk.org Thu Nov 7 10:06:49 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 7 Nov 2024 10:06:49 GMT Subject: Integrated: 8323803: ConstantOopReadValue::print_on should print 'null' instead of 'nullptr' In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 09:23:16 GMT, theoweidmannoracle wrote: > Printing incorrectly printed `nullptr` instead of `null` > > Buggy: > > > ScopeDesc(pc=0x0000000104c05468 offset=2e8): > java.lang.Class::desiredAssertionStatus at 20 (line 3984) > Locals > - l0: reg rfp [58],oop > - l1: stack[0],oop > - l2: nullptr > - l3: empty > Expression stack > - @0: nullptr > > > Fixed: > > > ScopeDesc(pc=0x0000000106fdd468 offset=2e8): > java.lang.Class::desiredAssertionStatus at 20 (line 3984) > Locals > - l0: reg rfp [58],oop > - l1: stack[0],oop > - l2: null > - l3: empty > Expression stack > - @0: null This pull request has now been integrated. Changeset: 7620b129 Author: Theo Weidmann Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/7620b129888d57514d9ef588e0681f1d43377236 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8323803: ConstantOopReadValue::print_on should print 'null' instead of 'nullptr' Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21869 From galder at openjdk.org Thu Nov 7 10:50:19 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 7 Nov 2024 10:50:19 GMT Subject: RFR: 8326369: Bimorphic inlining not applied at a call site that was initially monomorphic [v2] In-Reply-To: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> References: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> Message-ID: <14vSVV6FCG5GxZjE2heaPLpQZGTCB9xWB075R_bz_SA=.e3f8bcdd-5060-4a3c-9918-cd515e23feea@github.com> > This issue was fixed by JDK-8339299 indirectly (see [PR](https://github.com/openjdk/jdk/pull/20786)), but the PR for 8339299 didn't include any new tests, so I'm sending this PR with the test that Filipp added in 8326369 so that such issue doesn't come back inadvertently. I've added him as co-author and applied some small formatting changes. Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: Added copyright and @bug identifiers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21920/files - new: https://git.openjdk.org/jdk/pull/21920/files/1f548010..1bf6992c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21920&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21920&range=00-01 Stats: 25 lines in 1 file changed: 24 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21920.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21920/head:pull/21920 PR: https://git.openjdk.org/jdk/pull/21920 From galder at openjdk.org Thu Nov 7 10:50:19 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 7 Nov 2024 10:50:19 GMT Subject: RFR: 8326369: Bimorphic inlining not applied at a call site that was initially monomorphic [v2] In-Reply-To: References: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> Message-ID: On Wed, 6 Nov 2024 11:31:37 GMT, Tobias Hartmann wrote: >> Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: >> >> Added copyright and @bug identifiers > > Changes requested by thartmann (Reviewer). @TobiHartmann I've added `@bug` and copyright header. I've put Red Hat's copyright. @fzhinkin do you want me to add a line for Jetbrains to the copyright? I see it has been done in the past, e.g. `ComplexURITest`: /* Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2024 JetBrains s.r.o. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21920#issuecomment-2461903598 From mdoerr at openjdk.org Thu Nov 7 11:26:42 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 7 Nov 2024 11:26:42 GMT Subject: RFR: 8343506: [s390x] multiple test failures with ubsan In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 05:27:59 GMT, Amit Kumar wrote: > This is trivial patch which fixes the error I am seeing on s390x, while running tier1 with ubsan enabled. Please see JBS for more details. Looks correct. Additional improvements could be done separately. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21864#pullrequestreview-2420693793 From lucy at openjdk.org Thu Nov 7 11:33:41 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Thu, 7 Nov 2024 11:33:41 GMT Subject: RFR: 8343506: [s390x] multiple test failures with ubsan In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 05:27:59 GMT, Amit Kumar wrote: > This is trivial patch which fixes the error I am seeing on s390x, while running tier1 with ubsan enabled. Please see JBS for more details. LGTM. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21864#pullrequestreview-2420710850 From mdoerr at openjdk.org Thu Nov 7 13:23:19 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 7 Nov 2024 13:23:19 GMT Subject: RFR: 8343724: [PPC64] Disallow OptoScheduling [v2] In-Reply-To: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> References: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> Message-ID: > Force off OptoScheduling for PPC64. It should be properly maintained before allowing users to switch it on. It's probably not important for this platform. Also see JBS issue for motivation. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Move flag configuration to VM_Version::initialize(). Add EntryAlignment guarantee like on other platforms. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21935/files - new: https://git.openjdk.org/jdk/pull/21935/files/a9330b32..db0d279e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21935&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21935&range=00-01 Stats: 44 lines in 2 files changed: 22 ins; 21 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21935.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21935/head:pull/21935 PR: https://git.openjdk.org/jdk/pull/21935 From mdoerr at openjdk.org Thu Nov 7 13:23:19 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 7 Nov 2024 13:23:19 GMT Subject: RFR: 8343724: [PPC64] Disallow OptoScheduling [v2] In-Reply-To: References: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> Message-ID: On Thu, 7 Nov 2024 08:52:46 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Move flag configuration to VM_Version::initialize(). Add EntryAlignment guarantee like on other platforms. > > src/hotspot/cpu/ppc/c2_init_ppc.cpp line 58: > >> 56: warning("OptoScheduling is not supported on this CPU."); >> 57: FLAG_SET_DEFAULT(OptoScheduling, false); >> 58: } > > Makes sense but better do it in `VM_Version::initialize()` because `Compile::pd_compiler2_init()` is called after initialization of flags has been completed and the setting will not be shown with `PrintFlagsFinal`. > I'd even suggest to move the other flag settings there with this PR. This makes sense. Please see my update. I have also added `guarantee(CodeEntryAlignment >= InteriorEntryAlignment, "");` to `Compile::pd_compiler2_init()` which is there on other platforms. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21935#discussion_r1832669129 From mdoerr at openjdk.org Thu Nov 7 13:27:44 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 7 Nov 2024 13:27:44 GMT Subject: RFR: 8343724: [PPC64] Disallow OptoScheduling In-Reply-To: References: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> Message-ID: On Thu, 7 Nov 2024 05:39:24 GMT, Amit Kumar wrote: > I see that test is passing for s390x (with ubsan enabled). But still do you think we should disable for s390x as well ? You're free to decide. If there are no issues, there's no urgent need to change anything. On the other side, if it's not well maintained, then allowing the usage probably makes no sense. You could check if there's any performance difference. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21935#issuecomment-2462234538 From rrich at openjdk.org Thu Nov 7 13:32:50 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 7 Nov 2024 13:32:50 GMT Subject: RFR: 8343724: [PPC64] Disallow OptoScheduling [v2] In-Reply-To: References: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> Message-ID: On Thu, 7 Nov 2024 13:23:19 GMT, Martin Doerr wrote: >> Force off OptoScheduling for PPC64. It should be properly maintained before allowing users to switch it on. It's probably not important for this platform. Also see JBS issue for motivation. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Move flag configuration to VM_Version::initialize(). Add EntryAlignment guarantee like on other platforms. Looks good. Cheers, Richard. src/hotspot/cpu/ppc/vm_version_ppc.cpp line 174: > 172: > 173: // Power7 and later. > 174: if (PowerArchitecturePPC64 > 6) { Settings that depend on `PowerArchitecturePPC64` seem to be ordered. You might want to keep it like that. ------------- Marked as reviewed by rrich (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21935#pullrequestreview-2420983914 PR Review Comment: https://git.openjdk.org/jdk/pull/21935#discussion_r1832682141 From fzhinkin at openjdk.org Thu Nov 7 14:04:47 2024 From: fzhinkin at openjdk.org (Filipp Zhinkin) Date: Thu, 7 Nov 2024 14:04:47 GMT Subject: RFR: 8326369: Bimorphic inlining not applied at a call site that was initially monomorphic [v2] In-Reply-To: References: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> Message-ID: On Thu, 7 Nov 2024 10:45:00 GMT, Galder Zamarre?o wrote: >> Changes requested by thartmann (Reviewer). > > @TobiHartmann I've added `@bug` and copyright header. I've put Red Hat's copyright. > > @fzhinkin do you want me to add a line for Jetbrains to the copyright? I see it has been done in the past, e.g. `ComplexURITest`: > > > /* Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. > * Copyright (c) 2024 JetBrains s.r.o. @galderz, I'd appreciate it if you can add `Copyright (c) 2024 JetBrains s.r.o.. All rights reserved.` to the header. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21920#issuecomment-2462315472 From mdoerr at openjdk.org Thu Nov 7 14:09:02 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 7 Nov 2024 14:09:02 GMT Subject: RFR: 8343724: [PPC64] Disallow OptoScheduling [v3] In-Reply-To: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> References: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> Message-ID: > Force off OptoScheduling for PPC64. It should be properly maintained before allowing users to switch it on. It's probably not important for this platform. Also see JBS issue for motivation. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Move Power7 flags up. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21935/files - new: https://git.openjdk.org/jdk/pull/21935/files/db0d279e..f8257242 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21935&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21935&range=01-02 Stats: 22 lines in 1 file changed: 11 ins; 11 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21935.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21935/head:pull/21935 PR: https://git.openjdk.org/jdk/pull/21935 From mdoerr at openjdk.org Thu Nov 7 14:09:02 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 7 Nov 2024 14:09:02 GMT Subject: RFR: 8343724: [PPC64] Disallow OptoScheduling [v2] In-Reply-To: References: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> Message-ID: On Thu, 7 Nov 2024 13:28:49 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Move flag configuration to VM_Version::initialize(). Add EntryAlignment guarantee like on other platforms. > > src/hotspot/cpu/ppc/vm_version_ppc.cpp line 174: > >> 172: >> 173: // Power7 and later. >> 174: if (PowerArchitecturePPC64 > 6) { > > Settings that depend on `PowerArchitecturePPC64` seem to be ordered. You might want to keep it like that. I have moved these flags up. Note that the checks will get removed by [JDK-8331859](https://bugs.openjdk.org/browse/JDK-8331859). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21935#discussion_r1832740450 From mbaesken at openjdk.org Thu Nov 7 14:16:49 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 7 Nov 2024 14:16:49 GMT Subject: RFR: 8343724: [PPC64] Disallow OptoScheduling [v3] In-Reply-To: References: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> Message-ID: On Thu, 7 Nov 2024 14:09:02 GMT, Martin Doerr wrote: >> Force off OptoScheduling for PPC64. It should be properly maintained before allowing users to switch it on. It's probably not important for this platform. Also see JBS issue for motivation. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Move Power7 flags up. Marked as reviewed by mbaesken (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21935#pullrequestreview-2421105178 From mdoerr at openjdk.org Thu Nov 7 14:22:44 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 7 Nov 2024 14:22:44 GMT Subject: RFR: 8343724: [PPC64] Disallow OptoScheduling [v3] In-Reply-To: References: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> Message-ID: On Thu, 7 Nov 2024 14:09:02 GMT, Martin Doerr wrote: >> Force off OptoScheduling for PPC64. It should be properly maintained before allowing users to switch it on. It's probably not important for this platform. Also see JBS issue for motivation. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Move Power7 flags up. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21935#issuecomment-2462358079 From roland at openjdk.org Thu Nov 7 14:48:00 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 7 Nov 2024 14:48:00 GMT Subject: RFR: 8340532: C2: assert(is_OuterStripMinedLoop()) failed: invalid node class: IfTrue Message-ID: A `CountedLoopEnd` (that marks the end of a still existing `CountedLoop`) is optimized out because a dominating identical `CountedLoopEnd` (that no longer marks the end of an existing `CountedLoop` but was left behind by previous loop opts) is found. That causes the path out of `CountedLoopEnd` to become dead including the `OuterStripMinedLoopEnd`. The `OuterStripMinedLoop` looses its backedge as a consequence. The `CountedLoop` is still marked as strip mined but the outer loop doesn't exist anymore. The fix I propose for this corner case is to simply detect when that happens (during igvn AFAICT) and clear the strip mined flag from the `CountedLoop`. ------------- Commit messages: - fix & test Changes: https://git.openjdk.org/jdk/pull/21956/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21956&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340532 Stats: 75 lines in 3 files changed: 74 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21956.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21956/head:pull/21956 PR: https://git.openjdk.org/jdk/pull/21956 From roland at openjdk.org Thu Nov 7 14:54:41 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 7 Nov 2024 14:54:41 GMT Subject: RFR: 8342946: Replace predicate walking code in Loop Unrolling with a predicate visitor In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 09:22:02 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and Post Loop (integrated with https://github.com/openjdk/jdk/pull/21790) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): `PhaseIdealLoop::get_assertion_predicates()` used for Loop Unswitching and removing useless Template Assertion Predicate (integrated with https://github.com/openjdk/jdk/pull/21918) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (this PR) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679 / https://github.com/openjdk/jdk/pull/21790 / https://github.com/openjdk/jdk/pull/21918) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. T... Looks reasonable to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21944#pullrequestreview-2421213969 From rrich at openjdk.org Thu Nov 7 15:03:49 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 7 Nov 2024 15:03:49 GMT Subject: RFR: 8343724: [PPC64] Disallow OptoScheduling [v3] In-Reply-To: References: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> Message-ID: On Thu, 7 Nov 2024 14:09:02 GMT, Martin Doerr wrote: >> Force off OptoScheduling for PPC64. It should be properly maintained before allowing users to switch it on. It's probably not important for this platform. Also see JBS issue for motivation. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Move Power7 flags up. Marked as reviewed by rrich (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21935#pullrequestreview-2421239929 From chagedorn at openjdk.org Thu Nov 7 15:05:41 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 7 Nov 2024 15:05:41 GMT Subject: RFR: 8342946: Replace predicate walking code in Loop Unrolling with a predicate visitor In-Reply-To: References: Message-ID: <6EfxekMMTeswejTgNj2oHlzScpoW4LpHj5YkiXwM7Aw=.a0e3f667-4ad8-4308-90cd-3d1519c06e00@github.com> On Thu, 7 Nov 2024 09:22:02 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and Post Loop (integrated with https://github.com/openjdk/jdk/pull/21790) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): `PhaseIdealLoop::get_assertion_predicates()` used for Loop Unswitching and removing useless Template Assertion Predicate (integrated with https://github.com/openjdk/jdk/pull/21918) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (this PR) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679 / https://github.com/openjdk/jdk/pull/21790 / https://github.com/openjdk/jdk/pull/21918) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. T... Thanks Roland for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21944#issuecomment-2462463398 From chagedorn at openjdk.org Thu Nov 7 15:05:44 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 7 Nov 2024 15:05:44 GMT Subject: RFR: 8340532: C2: assert(is_OuterStripMinedLoop()) failed: invalid node class: IfTrue In-Reply-To: References: Message-ID: <7z556krCUH5mTfEeTgD75L3MHiUG9k-1_7Ox4LcH0F4=.a2830249-8597-4ff7-95cf-358a30f044bb@github.com> On Thu, 7 Nov 2024 14:42:41 GMT, Roland Westrelin wrote: > A `CountedLoopEnd` (that marks the end of a still existing > `CountedLoop`) is optimized out because a dominating identical > `CountedLoopEnd` (that no longer marks the end of an existing > `CountedLoop` but was left behind by previous loop opts) is > found. That causes the path out of `CountedLoopEnd` to become dead > including the `OuterStripMinedLoopEnd`. The `OuterStripMinedLoop` > looses its backedge as a consequence. The `CountedLoop` is still > marked as strip mined but the outer loop doesn't exist anymore. > > The fix I propose for this corner case is to simply detect when that > happens (during igvn AFAICT) and clear the strip mined flag from the > `CountedLoop`. Looks reasonable to me. test/hotspot/jtreg/compiler/loopstripmining/TestIdenticalDominatingCLE.java line 28: > 26: * @bug 8340532 > 27: * @summary C2: assert(is_OuterStripMinedLoop()) failed: invalid node class: IfTrue > 28: * Since you use C2 only flags, you should add: Suggestion: * @requires vm.compiler2.enabled ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21956#pullrequestreview-2421244527 PR Review Comment: https://git.openjdk.org/jdk/pull/21956#discussion_r1832837462 From acobbs at openjdk.org Thu Nov 7 15:46:45 2024 From: acobbs at openjdk.org (Archie Cobbs) Date: Thu, 7 Nov 2024 15:46:45 GMT Subject: RFR: 8343479: Remove unnecessary @SuppressWarnings annotations (hotspot) [v2] In-Reply-To: <3xJg8mwE5kmAA_DfVquqRuI9nbrHHTfv-kdePt_LF5E=.79702bef-f612-4914-b3ee-03a6c0ea306f@github.com> References: <3xJg8mwE5kmAA_DfVquqRuI9nbrHHTfv-kdePt_LF5E=.79702bef-f612-4914-b3ee-03a6c0ea306f@github.com> Message-ID: On Thu, 7 Nov 2024 07:48:43 GMT, Emanuel Peter wrote: >> Archie Cobbs has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Update copyright years. >> - Merge branch 'master' into SuppressWarningsCleanup-hotspot >> - Merge branch 'master' into SuppressWarningsCleanup-graal >> - Remove unnecessary @SuppressWarnings annotations. > > Hi @archiecobbs can you please give some more info about why these were introduced, and why they are now not needed any more? Hi @eme64, > Hi @archiecobbs can you please give some more info about why these were introduced, and why they are now not needed any more? FYI there are [several other](https://github.com/openjdk/jdk/pulls?q=author%3Aarchiecobbs+is%3Apr+%22Remove+unnecessary%22+in%3Atitle+) PR's like this one. I haven't checked exhaustively, but all of the ones I've checked appear to be due to either (a) the warning was never needed, or (b) a subsequent refinement of the warning itself which made the code no longer qualify as "warnable". For an example of (a) see commit 8fb70c710afa which added `@SuppressWarnings("unchecked")` for a cast to type `Key`, even though `Key` is not a generic type and so the cast was never unchecked in the first place. For an example of (b), see commit b431c6929d12 which added `@SuppressWarnings("serial")` because an anonymous class did not declare `serialVersionUID`, but then later the warning was was changed to no longer trigger in that situation by [JDK-7152104](https://bugs.openjdk.org/browse/JDK-7152104), but the annotation was not removed as part of that commit. In this particular PR, it looks like (for example) the useless `@SuppressWarnings("try")` annotations on `compileMethod()` was [added in this commit](https://github.com/openjdk/jdk/commit/3b0ee5a6d8b89a52b0dacc51399955631d6aa597#diff-4d3a3b7e7e12e1d5b4cf3e4677d9e0de5e9df3bbf1bbfa0d8d43d12098d67dc4) - probably a copy & paste error. This is typical. I guess the only other possibility is that the warning stopped working at some point due to a bug, but I haven't seen any examples of that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21853#issuecomment-2462566874 From duke at openjdk.org Thu Nov 7 16:08:08 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 7 Nov 2024 16:08:08 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v2] In-Reply-To: References: Message-ID: > This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: > > > ConINode* node = _igvn.intcon(i); > set_ctrl(node, C->root()); > > > and > > > ConLNode* node = _igvn.longcon(i); > set_ctrl(node, C->root()); > > > Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. theoweidmannoracle has updated the pull request incrementally with three additional commits since the last revision: - Update src/hotspot/share/opto/loopopts.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/loopopts.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/loopTransform.cpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21836/files - new: https://git.openjdk.org/jdk/pull/21836/files/38d5bd0d..d1817ee8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=00-01 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21836.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21836/head:pull/21836 PR: https://git.openjdk.org/jdk/pull/21836 From duke at openjdk.org Thu Nov 7 16:11:07 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 7 Nov 2024 16:11:07 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v3] In-Reply-To: References: Message-ID: > This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: > > > ConINode* node = _igvn.intcon(i); > set_ctrl(node, C->root()); > > > and > > > ConLNode* node = _igvn.longcon(i); > set_ctrl(node, C->root()); > > > Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/loopopts.cpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21836/files - new: https://git.openjdk.org/jdk/pull/21836/files/d1817ee8..798a6172 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21836.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21836/head:pull/21836 PR: https://git.openjdk.org/jdk/pull/21836 From roland at openjdk.org Thu Nov 7 16:18:09 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 7 Nov 2024 16:18:09 GMT Subject: RFR: 8340532: C2: assert(is_OuterStripMinedLoop()) failed: invalid node class: IfTrue [v2] In-Reply-To: References: Message-ID: > A `CountedLoopEnd` (that marks the end of a still existing > `CountedLoop`) is optimized out because a dominating identical > `CountedLoopEnd` (that no longer marks the end of an existing > `CountedLoop` but was left behind by previous loop opts) is > found. That causes the path out of `CountedLoopEnd` to become dead > including the `OuterStripMinedLoopEnd`. The `OuterStripMinedLoop` > looses its backedge as a consequence. The `CountedLoop` is still > marked as strip mined but the outer loop doesn't exist anymore. > > The fix I propose for this corner case is to simply detect when that > happens (during igvn AFAICT) and clear the strip mined flag from the > `CountedLoop`. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/loopstripmining/TestIdenticalDominatingCLE.java Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21956/files - new: https://git.openjdk.org/jdk/pull/21956/files/5e9ca1bf..a4649dd9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21956&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21956&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21956.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21956/head:pull/21956 PR: https://git.openjdk.org/jdk/pull/21956 From roland at openjdk.org Thu Nov 7 16:22:49 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 7 Nov 2024 16:22:49 GMT Subject: RFR: 8340532: C2: assert(is_OuterStripMinedLoop()) failed: invalid node class: IfTrue [v2] In-Reply-To: <7z556krCUH5mTfEeTgD75L3MHiUG9k-1_7Ox4LcH0F4=.a2830249-8597-4ff7-95cf-358a30f044bb@github.com> References: <7z556krCUH5mTfEeTgD75L3MHiUG9k-1_7Ox4LcH0F4=.a2830249-8597-4ff7-95cf-358a30f044bb@github.com> Message-ID: On Thu, 7 Nov 2024 15:02:48 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/loopstripmining/TestIdenticalDominatingCLE.java >> >> Co-authored-by: Christian Hagedorn > > test/hotspot/jtreg/compiler/loopstripmining/TestIdenticalDominatingCLE.java line 28: > >> 26: * @bug 8340532 >> 27: * @summary C2: assert(is_OuterStripMinedLoop()) failed: invalid node class: IfTrue >> 28: * > > Since you use C2 only flags, you should add: > Suggestion: > > * @requires vm.compiler2.enabled Right! Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21956#discussion_r1832973222 From kvn at openjdk.org Thu Nov 7 17:35:41 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 7 Nov 2024 17:35:41 GMT Subject: RFR: 8342946: Replace predicate walking code in Loop Unrolling with a predicate visitor In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 09:22:02 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and Post Loop (integrated with https://github.com/openjdk/jdk/pull/21790) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): `PhaseIdealLoop::get_assertion_predicates()` used for Loop Unswitching and removing useless Template Assertion Predicate (integrated with https://github.com/openjdk/jdk/pull/21918) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (this PR) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679 / https://github.com/openjdk/jdk/pull/21790 / https://github.com/openjdk/jdk/pull/21918) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. T... Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21944#pullrequestreview-2421716128 From kvn at openjdk.org Thu Nov 7 17:38:44 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 7 Nov 2024 17:38:44 GMT Subject: RFR: 8340532: C2: assert(is_OuterStripMinedLoop()) failed: invalid node class: IfTrue [v2] In-Reply-To: References: Message-ID: <0m4ib_nqLieISo4dU7rYBEwzL2EndD8AOWtrgH_qJwQ=.94fabea9-54ac-45fa-b6dc-f5ba94b04f13@github.com> On Thu, 7 Nov 2024 16:18:09 GMT, Roland Westrelin wrote: >> A `CountedLoopEnd` (that marks the end of a still existing >> `CountedLoop`) is optimized out because a dominating identical >> `CountedLoopEnd` (that no longer marks the end of an existing >> `CountedLoop` but was left behind by previous loop opts) is >> found. That causes the path out of `CountedLoopEnd` to become dead >> including the `OuterStripMinedLoopEnd`. The `OuterStripMinedLoop` >> looses its backedge as a consequence. The `CountedLoop` is still >> marked as strip mined but the outer loop doesn't exist anymore. >> >> The fix I propose for this corner case is to simply detect when that >> happens (during igvn AFAICT) and clear the strip mined flag from the >> `CountedLoop`. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/loopstripmining/TestIdenticalDominatingCLE.java > > Co-authored-by: Christian Hagedorn Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21956#pullrequestreview-2421726221 From kvn at openjdk.org Thu Nov 7 17:51:44 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 7 Nov 2024 17:51:44 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v3] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 08:09:55 GMT, theoweidmannoracle wrote: > > Do we have other places (not new constant node) where we set Root as control? May be we can add `set_root_as_ctrl(n)` method in `loop node.hpp` in such case. > > There's only three locations where control is set to the root in the loop files now (not counting the ones in the new methods I added). The main reason for this patch is bugs caused by people forgetting to set control for constants (e.g. https://bugs.openjdk.org/browse/JDK-8343137), which is now prevented if the new helper methods are used. > > Do you think there would be any benefit from introducing `set_root_as_ctrl(n)` given there's only about three places where this pattern occurs now? My suggesting is about additional cleaning code. I think 3 + 5 places are enough to justify to have a new function in header file. Also `set_root_as_ctrl(n)` could be copy of `set_ctrl(n, ctrl)` without 2 asserts which checks `ctrl`. It will be faster. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21836#issuecomment-2462873767 From mdoerr at openjdk.org Thu Nov 7 22:14:47 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 7 Nov 2024 22:14:47 GMT Subject: Integrated: 8343724: [PPC64] Disallow OptoScheduling In-Reply-To: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> References: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> Message-ID: On Wed, 6 Nov 2024 16:18:11 GMT, Martin Doerr wrote: > Force off OptoScheduling for PPC64. It should be properly maintained before allowing users to switch it on. It's probably not important for this platform. Also see JBS issue for motivation. This pull request has now been integrated. Changeset: f621f26c Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/f621f26cd113090a0305598cfc50f0eac9a263c6 Stats: 39 lines in 2 files changed: 22 ins; 16 del; 1 mod 8343724: [PPC64] Disallow OptoScheduling Reviewed-by: rrich, mbaesken ------------- PR: https://git.openjdk.org/jdk/pull/21935 From fyang at openjdk.org Fri Nov 8 02:18:06 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 8 Nov 2024 02:18:06 GMT Subject: RFR: 8343805: RISC-V: JVM crashes on startup when disabling compressed instructions Message-ID: <0MGF-2vv_88fCBAwbnHUgug-nizGHUDJNCZHKOwGjSU=.d9b3fa18-7f5f-4231-ab5f-26504becf025@github.com> Hello, please review this trivial change. The reason of the crash is that we will use more space for compiler stubs during stubRoutines generation when compressed instructions is disabled. So this simply increases the reserved size of compiler stubs for this CPU platform. After this change, we have: $ java -Xlog:stubs -XX:-UseRVC -version [0.010s][info][stubs] StubRoutines (initial stubs) [0x0000003f8f3cf340, 0x0000003f8f3d1cd0] used: 604, free: 10036 [0.117s][info][stubs] StubRoutines (continuation stubs) [0x0000003f8f3d25c0, 0x0000003f8f3d3010] used: 628, free: 2012 [0.153s][info][stubs] StubRoutines (final stubs) [0x0000003f8f4025c0, 0x0000003f8f409d70] used: 9380, free: 21260 [0.199s][info][stubs] StubRoutines (compiler stubs) [0x0000003f8f4d7c40, 0x0000003f8f4e3180] used: 38924, free: 7476 ------------- Commit messages: - 8343805: RISC-V: JVM crashes on startup when disabling compressed instructions Changes: https://git.openjdk.org/jdk/pull/21966/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21966&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343805 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21966.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21966/head:pull/21966 PR: https://git.openjdk.org/jdk/pull/21966 From dlong at openjdk.org Fri Nov 8 03:12:46 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 8 Nov 2024 03:12:46 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 10:19:51 GMT, theoweidmannoracle wrote: > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. I agree with Roland, rather that overwriting the old information, it would be nice to append to it. Unfortunately this late inlining print support is a bit complicated and also a bit broken, I discovered recently. It could probably use a cleanup. I hit one assert because there was no message printed in do_late_inline_check when allow_inline was set to true. When I investigated that, I discovered that print_inlining_commit() will happily append a new message next to an old message on the same line. Something is going wrong with the logic in print_inlining_update() when cg() is null. I'm wondering if we could simplify things by placing the stringStream inside the CallGenerator. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2463671322 From amitkumar at openjdk.org Fri Nov 8 04:54:32 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 8 Nov 2024 04:54:32 GMT Subject: RFR: 8343810: [s390x] is_uimm* methods should take unsigned arguments Message-ID: trivial patch which just updates the argument datatype of `is_uimm*` methods, from `int64_t` to `uint64_t`. ------------- Commit messages: - int64_t -> uint64_t Changes: https://git.openjdk.org/jdk/pull/21967/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21967&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343810 Stats: 7 lines in 1 file changed: 0 ins; 1 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/21967.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21967/head:pull/21967 PR: https://git.openjdk.org/jdk/pull/21967 From chagedorn at openjdk.org Fri Nov 8 06:21:31 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 8 Nov 2024 06:21:31 GMT Subject: RFR: 8340532: C2: assert(is_OuterStripMinedLoop()) failed: invalid node class: IfTrue [v2] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 16:18:09 GMT, Roland Westrelin wrote: >> A `CountedLoopEnd` (that marks the end of a still existing >> `CountedLoop`) is optimized out because a dominating identical >> `CountedLoopEnd` (that no longer marks the end of an existing >> `CountedLoop` but was left behind by previous loop opts) is >> found. That causes the path out of `CountedLoopEnd` to become dead >> including the `OuterStripMinedLoopEnd`. The `OuterStripMinedLoop` >> looses its backedge as a consequence. The `CountedLoop` is still >> marked as strip mined but the outer loop doesn't exist anymore. >> >> The fix I propose for this corner case is to simply detect when that >> happens (during igvn AFAICT) and clear the strip mined flag from the >> `CountedLoop`. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/loopstripmining/TestIdenticalDominatingCLE.java > > Co-authored-by: Christian Hagedorn Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21956#pullrequestreview-2422794662 From syan at openjdk.org Fri Nov 8 06:44:18 2024 From: syan at openjdk.org (SendaoYan) Date: Fri, 8 Nov 2024 06:44:18 GMT Subject: RFR: 8343488: Test VectorRebracket128Test.java can't exclude by test/hotspot/jtreg/ProblemList.txt Message-ID: Hi all, The test `test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java` can't exclude from `test/hotspot/jtreg/ProblemList.txt` correctly. The test only contains a single test, so it do not need to set test suffix. This PR remove the test suffix to make the Problemlist work normally, trivial fix, no risk. ------------- Commit messages: - 8343488: Test VectorRebracket128Test.java can't exclude by test/hotspot/jtreg/ProblemList.txt Changes: https://git.openjdk.org/jdk/pull/21968/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21968&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343488 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21968.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21968/head:pull/21968 PR: https://git.openjdk.org/jdk/pull/21968 From chagedorn at openjdk.org Fri Nov 8 07:05:28 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 8 Nov 2024 07:05:28 GMT Subject: RFR: 8343488: Test VectorRebracket128Test.java can't exclude by test/hotspot/jtreg/ProblemList.txt In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 06:39:24 GMT, SendaoYan wrote: > Hi all, > The test `test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java` can't exclude from `test/hotspot/jtreg/ProblemList.txt` correctly. The test only contains a single test, so it do not need to set test suffix. > This PR remove the test suffix to make the Problemlist work normally, trivial fix, no risk. Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21968#pullrequestreview-2422849876 From syan at openjdk.org Fri Nov 8 07:14:40 2024 From: syan at openjdk.org (SendaoYan) Date: Fri, 8 Nov 2024 07:14:40 GMT Subject: RFR: 8343488: Test VectorRebracket128Test.java can't exclude by test/hotspot/jtreg/ProblemList.txt In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 06:39:24 GMT, SendaoYan wrote: > Hi all, > The test `test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java` can't exclude from `test/hotspot/jtreg/ProblemList.txt` correctly. The test only contains a single test, so it do not need to set test suffix. > This PR remove the test suffix to make the Problemlist work normally, trivial fix, no risk. Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21968#issuecomment-2463926413 From chagedorn at openjdk.org Fri Nov 8 07:19:16 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 8 Nov 2024 07:19:16 GMT Subject: RFR: 8343745: Only update Last Value Assertion Predicates in Loop Unrolling Message-ID: (Note: This is a dependent PR on https://github.com/openjdk/jdk/pull/21944 which is fully reviewed but I'd prefer to integrate it on Monday and run again some testing over the weekend) This is a follow-up feature to https://github.com/openjdk/jdk/pull/21944 which updated Loop Unrolling to use a new predicate visitor and enables this patch. In Loop Unrolling, we only update the stride and not the init value of a loop. Thus, we actually only require to update the Last Value Assertion Predicates because the Init Value Assertion Predicates do not use `OpaqueLoopStride`. So, we also would not be required to kill the old Init Value Initialized Assertion Predicates. This patch implements that improvement. To make this work, we need to query the associated `AssertionPredicateType` of an Assertion Predicate which is stored in the `If/RangeCheckNode` (I guess it's okay to have this additional node field in product). This was guarded by `NOT_PRODUCT` before. For this patch, I make this information available in product builds and use it to implement this feature. Thanks, Christian ------------- Depends on: https://git.openjdk.org/jdk/pull/21944 Commit messages: - 8343745: Only update Last Value Assertion Predicates in Loop Unrolling Changes: https://git.openjdk.org/jdk/pull/21969/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21969&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343745 Stats: 101 lines in 7 files changed: 16 ins; 13 del; 72 mod Patch: https://git.openjdk.org/jdk/pull/21969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21969/head:pull/21969 PR: https://git.openjdk.org/jdk/pull/21969 From chagedorn at openjdk.org Fri Nov 8 07:19:16 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 8 Nov 2024 07:19:16 GMT Subject: RFR: 8343745: Only update Last Value Assertion Predicates in Loop Unrolling In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 07:12:12 GMT, Christian Hagedorn wrote: > (Note: This is a dependent PR on https://github.com/openjdk/jdk/pull/21944 which is fully reviewed but I'd prefer to integrate it on Monday and run again some testing over the weekend) > > This is a follow-up feature to https://github.com/openjdk/jdk/pull/21944 which updated Loop Unrolling to use a new predicate visitor and enables this patch. > > In Loop Unrolling, we only update the stride and not the init value of a loop. Thus, we actually only require to update the Last Value Assertion Predicates because the Init Value Assertion Predicates do not use `OpaqueLoopStride`. So, we also would not be required to kill the old Init Value Initialized Assertion Predicates. This patch implements that improvement. > > To make this work, we need to query the associated `AssertionPredicateType` of an Assertion Predicate which is stored in the `If/RangeCheckNode` (I guess it's okay to have this additional node field in product). This was guarded by `NOT_PRODUCT` before. For this patch, I make this information available in product builds and use it to implement this feature. > > Thanks, > Christian src/hotspot/share/opto/predicates.cpp line 881: > 879: // Only Last Value Assertion Predicates have an OpaqueLoopStrideNode. > 880: return; > 881: } Skipping to update Init Value Template Assertion Predicate. src/hotspot/share/opto/predicates.hpp line 1073: > 1071: // Only Last Value Initialized Assertion Predicates need to be killed and updated. > 1072: initialized_assertion_predicate.kill(_phase); > 1073: } Only killing old Last Value Initialized Assertion Predicate ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21969#discussion_r1833814389 PR Review Comment: https://git.openjdk.org/jdk/pull/21969#discussion_r1833813957 From roland at openjdk.org Fri Nov 8 07:54:42 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 8 Nov 2024 07:54:42 GMT Subject: RFR: 8340532: C2: assert(is_OuterStripMinedLoop()) failed: invalid node class: IfTrue [v2] In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 06:17:49 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/loopstripmining/TestIdenticalDominatingCLE.java >> >> Co-authored-by: Christian Hagedorn > > Marked as reviewed by chagedorn (Reviewer). @chhagedorn @vnkozlov thanks for the review ------------- PR Comment: https://git.openjdk.org/jdk/pull/21956#issuecomment-2463981674 From roland at openjdk.org Fri Nov 8 07:57:18 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 8 Nov 2024 07:57:18 GMT Subject: Integrated: 8340532: C2: assert(is_OuterStripMinedLoop()) failed: invalid node class: IfTrue In-Reply-To: References: Message-ID: <57sjv4VxZ2KunadWfkprDW5tlKiWMM45J4UEOJhCQPI=.c17d2245-a825-46db-b365-40c203fcc9eb@github.com> On Thu, 7 Nov 2024 14:42:41 GMT, Roland Westrelin wrote: > A `CountedLoopEnd` (that marks the end of a still existing > `CountedLoop`) is optimized out because a dominating identical > `CountedLoopEnd` (that no longer marks the end of an existing > `CountedLoop` but was left behind by previous loop opts) is > found. That causes the path out of `CountedLoopEnd` to become dead > including the `OuterStripMinedLoopEnd`. The `OuterStripMinedLoop` > looses its backedge as a consequence. The `CountedLoop` is still > marked as strip mined but the outer loop doesn't exist anymore. > > The fix I propose for this corner case is to simply detect when that > happens (during igvn AFAICT) and clear the strip mined flag from the > `CountedLoop`. This pull request has now been integrated. Changeset: a10b1ccd Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/a10b1ccd377335354db7505e9944496729e539ce Stats: 75 lines in 3 files changed: 74 ins; 1 del; 0 mod 8340532: C2: assert(is_OuterStripMinedLoop()) failed: invalid node class: IfTrue Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21956 From jbhateja at openjdk.org Fri Nov 8 08:15:32 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 8 Nov 2024 08:15:32 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v2] In-Reply-To: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: > This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ instruction for following IR pallets. > > > MulVL ( AndV SRC1, 0xFFFFFFFF) ( AndV SRC2, 0xFFFFFFFF) > MulVL (URShiftVL SRC1 , 32) (URShiftVL SRC2, 32) > MulVL (URShiftVL SRC1 , 32) ( AndV SRC2, 0xFFFFFFFF) > MulVL ( AndV SRC1, 0xFFFFFFFF) (URShiftVL SRC2 , 32) > MulVL (VectorCastI2X SRC1) (VectorCastI2X SRC2) > MulVL (RShiftVL SRC1 , 32) (RShiftVL SRC2, 32) > > > > A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. > > If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. > > VPMUL[U]DQ instruction performs [unsigned] multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. > > Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- > > > Sierra Forest :- > ============ > Baseline:- > Benchmark (SIZE) Mode Cnt Score Error Units > VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms > > With Optimization:- > Benchmark (SIZE) Mode ... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Creating specialized IR to shield pattern from subsequent transforms in optimization pipeline ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21244/files - new: https://git.openjdk.org/jdk/pull/21244/files/43320063..613f491b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21244&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21244&range=00-01 Stats: 69 lines in 7 files changed: 57 ins; 3 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/21244.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21244/head:pull/21244 PR: https://git.openjdk.org/jdk/pull/21244 From rcastanedalo at openjdk.org Fri Nov 8 08:52:34 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 8 Nov 2024 08:52:34 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v2] In-Reply-To: References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: On Wed, 6 Nov 2024 20:30:26 GMT, Tobias Holenstein wrote: >> color >> >> pick >> >> nodes >> >> Adds new option to IGV to color selected nodes: >> 1) select some nodes >> 2) `Ctrl + C` or `View` -> `Color action` >> 3) pick a color and apply > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > Update src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/DiagramScene.java > > Co-authored-by: Andrey Turbanov Nice improvement, thanks for working on this! If the user selects a dark color, the node labels might become hard to read. Here's a simple change that addresses that by coloring the labels in white in that case. Please consider merging it into this PR: diff --git a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/widgets/FigureWidget.java b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/widgets/FigureWidget.java index a469d196a6b..495d844eb34 100644 --- a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/widgets/FigureWidget.java +++ b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/widgets/FigureWidget.java @@ -113,7 +113,6 @@ public FigureWidget(final Figure f, DiagramScene scene) { LayoutFactory.SerialAlignment.LEFT_TOP : LayoutFactory.SerialAlignment.CENTER; middleWidget.setLayout(LayoutFactory.createVerticalFlowLayout(textAlign, 0)); - middleWidget.setBackground(f.getColor()); middleWidget.setOpaque(true); middleWidget.getActions().addAction(new DoubleClickAction(this)); middleWidget.setCheckClipping(false); @@ -143,7 +142,6 @@ public FigureWidget(final Figure f, DiagramScene scene) { textWidget.addChild(lw); lw.setLabel(displayString); lw.setFont(Diagram.FONT); - lw.setForeground(getTextColor()); lw.setAlignment(LabelWidget.Alignment.CENTER); lw.setVerticalAlignment(LabelWidget.VerticalAlignment.CENTER); lw.setBorder(BorderFactory.createEmptyBorder()); @@ -151,6 +149,8 @@ public FigureWidget(final Figure f, DiagramScene scene) { } formatExtraLabel(false); + refreshColor(); + if (getFigure().getWarning() != null) { ImageWidget warningWidget = new ImageWidget(scene, warningSign); Point warningLocation = new Point(getFigure().getWidth() - Figure.WARNING_WIDTH - Figure.INSET / 2, 0); @@ -186,6 +186,9 @@ protected Sheet createSheet() { public void refreshColor() { middleWidget.setBackground(figure.getColor()); + for (LabelWidget lw : labelWidgets) { + lw.setForeground(getTextColor()); + } } @Override ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21925#pullrequestreview-2423097780 From rcastanedalo at openjdk.org Fri Nov 8 09:06:33 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 8 Nov 2024 09:06:33 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v2] In-Reply-To: References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: On Wed, 6 Nov 2024 20:30:26 GMT, Tobias Holenstein wrote: >> color >> >> pick >> >> nodes >> >> Adds new option to IGV to color selected nodes: >> 1) select some nodes >> 2) `Ctrl + C` or `View` -> `Color action` >> 3) pick a color and apply > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > Update src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/DiagramScene.java > > Co-authored-by: Andrey Turbanov In my opinion, the IGV toolbar is already pretty crowded (this hurts most when opening two graphs side-by-side) and I would prefer not adding the color icon there. On the other hand, we could add the action to the pop-up menu that's opened when right-clicking into a node or set of nodes. Here's my suggestion: diff --git a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java index e68abd3297e..c4f2ac670e7 100644 --- a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java +++ b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java @@ -100,6 +100,7 @@ public EditorTopComponent(DiagramViewModel diagramViewModel) { }; Action[] actionsWithSelection = new Action[]{ + ColorAction.get(ColorAction.class), ExtractAction.get(ExtractAction.class), HideAction.get(HideAction.class), null, @@ -168,8 +169,6 @@ public void mouseMoved(MouseEvent e) {} toolBar.add(ReduceDiffAction.get(ReduceDiffAction.class)); toolBar.add(ExpandDiffAction.get(ExpandDiffAction.class)); toolBar.addSeparator(); - toolBar.add(ColorAction.get(ColorAction.class)); - toolBar.addSeparator(); toolBar.add(ExtractAction.get(ExtractAction.class)); toolBar.add(HideAction.get(HideAction.class)); toolBar.add(ShowAllAction.get(ShowAllAction.class)); diff --git a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ColorAction.java b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ColorAction.java index a51934a4322..92921c81512 100644 --- a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ColorAction.java +++ b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ColorAction.java @@ -43,7 +43,7 @@ @ActionReference(path = "Shortcuts", name = "D-C") }) @Messages({ - "CTL_ColorAction=Color action", + "CTL_ColorAction=Color", "HINT_ColorAction=Color current set of selected nodes" }) public final class ColorAction extends ModelAwareAction { diff --git a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ExtractAction.java b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ExtractAction.java index 24815527a0e..c2329cbb26f 100644 --- a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ExtractAction.java +++ b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ExtractAction.java @@ -42,7 +42,7 @@ @ActionReference(path = "Shortcuts", name = "D-X") }) @Messages({ - "CTL_ExtractAction=Extract action", + "CTL_ExtractAction=Extract", "HINT_ExtractAction=Extract current set of selected nodes" }) public final class ExtractAction extends ModelAwareAction { @tobiasholenstein @chhagedorn what do you think? If you agree, feel free to merge the patch into this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21925#issuecomment-2464175649 From chagedorn at openjdk.org Fri Nov 8 09:13:51 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 8 Nov 2024 09:13:51 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v2] In-Reply-To: References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: On Fri, 8 Nov 2024 08:49:33 GMT, Roberto Casta?eda Lozano wrote: > If the user selects a dark color, the node labels might become hard to read. Here's a simple change that addresses that by coloring the labels in white in that case. Please consider merging it into this PR: Great idea! > In my opinion, the IGV toolbar is already pretty crowded (this hurts most when opening two graphs side-by-side) and I would prefer not adding the color icon there I agree with this. Maybe we should generally think about cleaning the toolbar and dropping some of the fewer used icons. > On the other hand, we could add the action to the pop-up menu that's opened when right-clicking into a node or set of nodes. I thought about this, too. I think that would be quite handy and an intuitive thing to do when not being aware of the feature and checking if there is an option to do it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21925#issuecomment-2464190597 From mli at openjdk.org Fri Nov 8 10:29:14 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 8 Nov 2024 10:29:14 GMT Subject: RFR: 8343555: RISC-V: make some verified (on hardware) extension options diagnostic [v2] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 01:41:51 GMT, Fei Yang wrote: > Please also update the JBS title to reflect the latest version, as we are targeting more options than a single UseZvfh. Thanks, modified. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21885#issuecomment-2464345402 From tholenstein at openjdk.org Fri Nov 8 10:29:32 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 8 Nov 2024 10:29:32 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v3] In-Reply-To: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: <_3cywd3W1exRW34ru-WV9_7X2OdeZHG_mmVfyuarMuQ=.540d6650-e043-45e5-9f22-91bfab76cb61@github.com> > color > > pick > > nodes > > Adds new option to IGV to color selected nodes: > 1) select some nodes > 2) `Ctrl + C` or `View` -> `Color action` > 3) pick a color and apply Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: white font for dark colors ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21925/files - new: https://git.openjdk.org/jdk/pull/21925/files/17205bab..56e046ca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=01-02 Stats: 17 lines in 1 file changed: 14 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21925.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21925/head:pull/21925 PR: https://git.openjdk.org/jdk/pull/21925 From tholenstein at openjdk.org Fri Nov 8 10:32:32 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 8 Nov 2024 10:32:32 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v4] In-Reply-To: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: > color > > pick > > nodes > > Adds new option to IGV to color selected nodes: > 1) select some nodes > 2) `Ctrl + C` or `View` -> `Color action` > 3) pick a color and apply Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: move from toolbar to menu ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21925/files - new: https://git.openjdk.org/jdk/pull/21925/files/56e046ca..6d7856ed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=02-03 Stats: 5 lines in 3 files changed: 1 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21925.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21925/head:pull/21925 PR: https://git.openjdk.org/jdk/pull/21925 From tholenstein at openjdk.org Fri Nov 8 10:39:20 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 8 Nov 2024 10:39:20 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v2] In-Reply-To: References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: <-nIYtO7mMceSo3Ux84ByTa1FWrZu48pXXw4ED7f4QYc=.dd22ae23-4de4-4ff0-917d-49e59d972e5e@github.com> On Fri, 8 Nov 2024 09:02:53 GMT, Roberto Casta?eda Lozano wrote: >> Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/DiagramScene.java >> >> Co-authored-by: Andrey Turbanov > > In my opinion, the IGV toolbar is already pretty crowded (this hurts most when opening two graphs side-by-side) and I would prefer not adding the color icon there. On the other hand, we could add the action to the pop-up menu that's opened when right-clicking into a node or set of nodes. Here's my suggestion: > > > diff --git a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java > index e68abd3297e..c4f2ac670e7 100644 > --- a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java > +++ b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java > @@ -100,6 +100,7 @@ public EditorTopComponent(DiagramViewModel diagramViewModel) { > }; > > Action[] actionsWithSelection = new Action[]{ > + ColorAction.get(ColorAction.class), > ExtractAction.get(ExtractAction.class), > HideAction.get(HideAction.class), > null, > @@ -168,8 +169,6 @@ public void mouseMoved(MouseEvent e) {} > toolBar.add(ReduceDiffAction.get(ReduceDiffAction.class)); > toolBar.add(ExpandDiffAction.get(ExpandDiffAction.class)); > toolBar.addSeparator(); > - toolBar.add(ColorAction.get(ColorAction.class)); > - toolBar.addSeparator(); > toolBar.add(ExtractAction.get(ExtractAction.class)); > toolBar.add(HideAction.get(HideAction.class)); > toolBar.add(ShowAllAction.get(ShowAllAction.class)); > diff --git a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ColorAction.java b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ColorAction.java > index a51934a4322..92921c81512 100644 > --- a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ColorAction.java > +++ b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ColorAction.java > @@ -43,7 +43,7 @@ > @ActionReference(path = "Shortcuts", name = "D-C") > }) > @Messages({ > - "CTL_ColorAction=Color action", > + "CTL_ColorAction=Color", > "HINT_ColorAction=Color current set of selected nodes" > }) > public final class ColorAction extends ModelAwareAction { > diff --git a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ExtractAction.java b/src/utils/I... @robcasloz I have applied both your patches ------------- PR Comment: https://git.openjdk.org/jdk/pull/21925#issuecomment-2464366255 From fyang at openjdk.org Fri Nov 8 10:40:22 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 8 Nov 2024 10:40:22 GMT Subject: RFR: 8343555: RISC-V: make some verified (on hardware) extension options diagnostic [v2] In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 18:42:04 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this simple patch? >> Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. >> Thanks > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > turn more verified extensions as DIAGNOSTIC LGTM. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21885#pullrequestreview-2423360257 From rcastanedalo at openjdk.org Fri Nov 8 10:50:06 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 8 Nov 2024 10:50:06 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v4] In-Reply-To: References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: On Fri, 8 Nov 2024 10:32:32 GMT, Tobias Holenstein wrote: >> color >> >> pick >> >> nodes >> >> Adds new option to IGV to color selected nodes: >> 1) select some nodes >> 2) `Ctrl + C` or `View` -> `Color action` >> 3) pick a color and apply > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > move from toolbar to menu Looks good, thanks Toby! I see that you also fixed the extra label color, nice! would be good to factor out that code together with that of `FigureWidget::getTextColor()`, but not a must. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21925#pullrequestreview-2423381186 From chagedorn at openjdk.org Fri Nov 8 11:33:24 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 8 Nov 2024 11:33:24 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v4] In-Reply-To: References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: <8pFlnBnaHRzeynpL2wS6sd7kiOCAy08J8wc1jRAC8AU=.579c3e98-6b9c-476d-bca2-d9bf05ec1c51@github.com> On Fri, 8 Nov 2024 10:32:32 GMT, Tobias Holenstein wrote: >> color >> >> pick >> >> nodes >> >> Adds new option to IGV to color selected nodes: >> 1) select some nodes >> 2) `Ctrl + C` or `View` -> `Color action` >> 3) pick a color and apply > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > move from toolbar to menu Looks good! Just tried it out. One more thing I've noticed: When selecting a color for a node and then trying to color another node, the color selection resets back to `#ffffff`. Would be nice if the last selection would have been stored. But I'm not sure how easy this is. Could also be done separately. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21925#pullrequestreview-2423469608 From amitkumar at openjdk.org Fri Nov 8 12:48:33 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 8 Nov 2024 12:48:33 GMT Subject: RFR: 8343506: [s390x] multiple test failures with ubsan In-Reply-To: References: <-_6fxDHjNO3eG0JX_Nqscc0bVUQgLcmK5OjoVME7xNk=.6ae3a749-36a4-47d2-9ff1-a29c28f97dd8@github.com> Message-ID: <80RUEwYH-agh78uyArNV-ZD6Thxj-76vDyNnwCMYwm0=.1b97264a-4d7b-48c3-8eab-696dbbc01de9@github.com> On Wed, 6 Nov 2024 00:56:24 GMT, Dean Long wrote: >> I don't think this is necessary. Unsigned subtraction with wrap-around is not undefined behavior. > > Right, it's not UB, but sometimes it is a bug, and would be flagged by things like -fsanitize=unsigned-integer-overflow, so my preference would be to avoid it if possible. As it is not really required and for `storage to storage` instructions `length = 0` is invalid case, which current code is already taking care of. So I would just simply keep it that way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21864#discussion_r1834351053 From amitkumar at openjdk.org Fri Nov 8 12:48:33 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 8 Nov 2024 12:48:33 GMT Subject: Integrated: 8343506: [s390x] multiple test failures with ubsan In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 05:27:59 GMT, Amit Kumar wrote: > This is trivial patch which fixes the error I am seeing on s390x, while running tier1 with ubsan enabled. Please see JBS for more details. This pull request has now been integrated. Changeset: f6edfe58 Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/f6edfe58d6931b058a5fec722615740818711065 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8343506: [s390x] multiple test failures with ubsan Reviewed-by: mdoerr, lucy ------------- PR: https://git.openjdk.org/jdk/pull/21864 From rrich at openjdk.org Fri Nov 8 14:24:21 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 8 Nov 2024 14:24:21 GMT Subject: RFR: 8343774: compiler/c2/TestCastX2NotProcessedIGVN.java fails on ppc64(le) & s390x platforms Message-ID: Vectorization of the loop in `test2` does not work on ppc therefore I want to exclude it there. ------------- Commit messages: - Exclude ppc64 Changes: https://git.openjdk.org/jdk/pull/21975/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21975&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343774 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21975.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21975/head:pull/21975 PR: https://git.openjdk.org/jdk/pull/21975 From rrich at openjdk.org Fri Nov 8 14:24:21 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 8 Nov 2024 14:24:21 GMT Subject: RFR: 8343774: compiler/c2/TestCastX2NotProcessedIGVN.java fails on ppc64(le) & s390x platforms In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 11:44:21 GMT, Richard Reingruber wrote: > Vectorization of the loop in `test2` does not work on ppc therefore I want to exclude it there. @offamitkumar want me to exclude s390x as well? @MBaesken said TestCastX2NotProcessedIGVN.java was failing there too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21975#issuecomment-2464886794 From tholenstein at openjdk.org Fri Nov 8 14:31:58 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 8 Nov 2024 14:31:58 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v5] In-Reply-To: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: > color > > pick > > nodes > > Adds new option to IGV to color selected nodes: > 1) select some nodes > 2) `Ctrl + C` or `View` -> `Color action` > 3) pick a color and apply Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: remember recent colors and have 10 defaults ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21925/files - new: https://git.openjdk.org/jdk/pull/21925/files/6d7856ed..f0b78af7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=03-04 Stats: 78 lines in 1 file changed: 72 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/21925.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21925/head:pull/21925 PR: https://git.openjdk.org/jdk/pull/21925 From tholenstein at openjdk.org Fri Nov 8 14:31:58 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 8 Nov 2024 14:31:58 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v4] In-Reply-To: <8pFlnBnaHRzeynpL2wS6sd7kiOCAy08J8wc1jRAC8AU=.579c3e98-6b9c-476d-bca2-d9bf05ec1c51@github.com> References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> <8pFlnBnaHRzeynpL2wS6sd7kiOCAy08J8wc1jRAC8AU=.579c3e98-6b9c-476d-bca2-d9bf05ec1c51@github.com> Message-ID: On Fri, 8 Nov 2024 11:29:47 GMT, Christian Hagedorn wrote: > Looks good! Just tried it out. One more thing I've noticed: When selecting a color for a node and then trying to color another node, the color selection resets back to `#ffffff`. Would be nice if the last selection would have been stored. But I'm not sure how easy this is. Could also be done separately. ![Screenshot 2024-11-08 at 15 27 06](https://github.com/user-attachments/assets/6dbf0732-f643-4ee2-add9-6adaee380fc0) right. I have updated it now to save the last colors and provide some default colors ------------- PR Comment: https://git.openjdk.org/jdk/pull/21925#issuecomment-2464903951 From roland at openjdk.org Fri Nov 8 14:41:11 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 8 Nov 2024 14:41:11 GMT Subject: RFR: 8343774: compiler/c2/TestCastX2NotProcessedIGVN.java fails on ppc64(le) & s390x platforms In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 11:44:21 GMT, Richard Reingruber wrote: > Vectorization of the loop in `test2` does not work on ppc therefore I want to exclude it there. Thanks for taking care of that. Maybe it would be more robust to only enable the test on x86, aarch64 and riscv64. The whole test doesn't need to excluded actually. Only IR matching on `test2` needs to be disabled. This can be done with: applyIfPlatformOr = {"x64", "true", "aarch64", "true", "riscv64", "true"}) See `TestBoolNodeGVN.java` for instance. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21975#issuecomment-2464920227 From amitkumar at openjdk.org Fri Nov 8 14:45:17 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 8 Nov 2024 14:45:17 GMT Subject: RFR: 8343774: compiler/c2/TestCastX2NotProcessedIGVN.java fails on ppc64(le) & s390x platforms In-Reply-To: References: Message-ID: <0Q64LeFu-0amchhchBKVMWTr6CZjv8LQrrF7RtPW_Po=.78656350-0d90-4d58-84d4-670536f537c7@github.com> On Fri, 8 Nov 2024 14:21:26 GMT, Richard Reingruber wrote: > @offamitkumar want me to exclude s390x as well? @MBaesken said TestCastX2NotProcessedIGVN.java was failing there too. Yes it is failing for s390x as well. I think we should exclude s390x as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21975#issuecomment-2464931057 From tholenstein at openjdk.org Fri Nov 8 14:50:45 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 8 Nov 2024 14:50:45 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v6] In-Reply-To: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: > color > > pick > > nodes > > Adds new option to IGV to color selected nodes: > 1) select some nodes > 2) `Ctrl + C` or `View` -> `Color action` > 3) pick a color and apply Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: refactor getTextColor() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21925/files - new: https://git.openjdk.org/jdk/pull/21925/files/f0b78af7..30dc5261 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=04-05 Stats: 34 lines in 2 files changed: 9 ins; 10 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/21925.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21925/head:pull/21925 PR: https://git.openjdk.org/jdk/pull/21925 From tholenstein at openjdk.org Fri Nov 8 14:50:46 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 8 Nov 2024 14:50:46 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v4] In-Reply-To: References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: On Fri, 8 Nov 2024 10:46:35 GMT, Roberto Casta?eda Lozano wrote: > Looks good, thanks Toby! I see that you also fixed the extra label color, nice! would be good to factor out that code together with that of `FigureWidget::getTextColor()`, but not a must. done ------------- PR Comment: https://git.openjdk.org/jdk/pull/21925#issuecomment-2464943221 From rrich at openjdk.org Fri Nov 8 14:53:25 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 8 Nov 2024 14:53:25 GMT Subject: RFR: 8343774: compiler/c2/TestCastX2NotProcessedIGVN.java fails on ppc64(le) & s390x platforms [v2] In-Reply-To: References: Message-ID: > Vectorization of the loop in `test2` does not work on ppc therefore I want to exclude it there. Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Positive list for test2 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21975/files - new: https://git.openjdk.org/jdk/pull/21975/files/c6bac710..a7c2872b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21975&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21975&range=00-01 Stats: 3 lines in 1 file changed: 1 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21975.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21975/head:pull/21975 PR: https://git.openjdk.org/jdk/pull/21975 From rrich at openjdk.org Fri Nov 8 14:53:25 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 8 Nov 2024 14:53:25 GMT Subject: RFR: 8343774: compiler/c2/TestCastX2NotProcessedIGVN.java fails on ppc64(le) & s390x platforms In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 14:37:05 GMT, Roland Westrelin wrote: > Thanks for taking care of that. Maybe it would be more robust to only enable the test on x86, aarch64 and riscv64. The whole test doesn't need to excluded actually. Only IR matching on `test2` needs to be disabled. This can be done with: > > ``` > applyIfPlatformOr = {"x64", "true", "aarch64", "true", "riscv64", "true"}) > ``` > > See `TestBoolNodeGVN.java` for instance. Ok. I've done that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21975#issuecomment-2464947087 From tholenstein at openjdk.org Fri Nov 8 14:53:53 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 8 Nov 2024 14:53:53 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v7] In-Reply-To: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: > color > > pick > > nodes > > Adds new option to IGV to color selected nodes: > 1) select some nodes > 2) `Ctrl + C` or `View` -> `Color action` > 3) pick a color and apply Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: copyright year ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21925/files - new: https://git.openjdk.org/jdk/pull/21925/files/30dc5261..403d8b5c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=05-06 Stats: 5 lines in 5 files changed: 0 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21925.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21925/head:pull/21925 PR: https://git.openjdk.org/jdk/pull/21925 From duke at openjdk.org Fri Nov 8 14:55:53 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Fri, 8 Nov 2024 14:55:53 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v4] In-Reply-To: References: Message-ID: > This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: > > > ConINode* node = _igvn.intcon(i); > set_ctrl(node, C->root()); > > > and > > > ConLNode* node = _igvn.longcon(i); > set_ctrl(node, C->root()); > > > Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Add set_root_as_ctrl ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21836/files - new: https://git.openjdk.org/jdk/pull/21836/files/798a6172..3dc3befd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=02-03 Stats: 23 lines in 6 files changed: 4 ins; 6 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/21836.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21836/head:pull/21836 PR: https://git.openjdk.org/jdk/pull/21836 From roland at openjdk.org Fri Nov 8 15:00:49 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 8 Nov 2024 15:00:49 GMT Subject: RFR: 8343774: compiler/c2/TestCastX2NotProcessedIGVN.java fails on ppc64(le) & s390x platforms [v2] In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 14:53:25 GMT, Richard Reingruber wrote: >> Vectorization of the loop in `test2` does not work on ppc therefore I want to exclude it there. > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Positive list for test2 Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21975#pullrequestreview-2424024608 From duke at openjdk.org Fri Nov 8 15:07:30 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Fri, 8 Nov 2024 15:07:30 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v4] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 17:48:27 GMT, Vladimir Kozlov wrote: >>> Do we have other places (not new constant node) where we set Root as control? May be we can add `set_root_as_ctrl(n)` method in `loop node.hpp` in such case. >> >> There's only three locations where control is set to the root in the loop files now (not counting the ones in the new methods I added). The main reason for this patch is bugs caused by people forgetting to set control for constants (e.g. https://bugs.openjdk.org/browse/JDK-8343137), which is now prevented if the new helper methods are used. >> >> Do you think there would be any benefit from introducing `set_root_as_ctrl(n)` given there's only about three places where this pattern occurs now? > >> > Do we have other places (not new constant node) where we set Root as control? May be we can add `set_root_as_ctrl(n)` method in `loop node.hpp` in such case. >> >> There's only three locations where control is set to the root in the loop files now (not counting the ones in the new methods I added). The main reason for this patch is bugs caused by people forgetting to set control for constants (e.g. https://bugs.openjdk.org/browse/JDK-8343137), which is now prevented if the new helper methods are used. >> >> Do you think there would be any benefit from introducing `set_root_as_ctrl(n)` given there's only about three places where this pattern occurs now? > > My suggesting is about additional cleaning code. I think 3 + 5 places are enough to justify to have a new function in header file. Also `set_root_as_ctrl(n)` could be copy of `set_ctrl(n, ctrl)` without 2 asserts which checks `ctrl`. It will be faster. @vnkozlov I implemented your suggestion. Would you like to take another look? (It also helped me discover some more cases which can be replaced with the new *con*() functions.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21836#issuecomment-2464983061 From chagedorn at openjdk.org Fri Nov 8 15:15:30 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 8 Nov 2024 15:15:30 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v7] In-Reply-To: References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: <_GEY6_2QtpdQv7_4xLHACtWQE3QZC4EWZjq1tTxbNgI=.2b8bf7ef-71a8-4297-87ed-caf47766557f@github.com> On Fri, 8 Nov 2024 14:53:53 GMT, Tobias Holenstein wrote: >> color >> >> pick >> >> nodes >> >> Adds new option to IGV to color selected nodes: >> 1) select some nodes >> 2) `Ctrl + C` or `View` -> `Color action` >> 3) pick a color and apply > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > copyright year Great idea with the defaults! I've just tried it out on Linux. It does seem to remember my last choice and offers me some defaults. But somehow the window look off: ![image](https://github.com/user-attachments/assets/626f6aef-195a-4b37-9869-16e8ccf26517) The defaults are hard to see and the rectangle saying "Preview" to the left is strange. It also says "Color Name: #ffffff" even though it chooses the last selected one correctly when pressing "OK". ------------- PR Comment: https://git.openjdk.org/jdk/pull/21925#issuecomment-2465002719 From tholenstein at openjdk.org Fri Nov 8 15:46:33 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 8 Nov 2024 15:46:33 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v8] In-Reply-To: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: > color > > pick > > nodes > > Adds new option to IGV to color selected nodes: > 1) select some nodes > 2) `Ctrl + C` or `View` -> `Color action` > 3) pick a color and apply Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: use MetalLookAndFeel ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21925/files - new: https://git.openjdk.org/jdk/pull/21925/files/403d8b5c..c9af2285 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=06-07 Stats: 42 lines in 1 file changed: 40 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21925.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21925/head:pull/21925 PR: https://git.openjdk.org/jdk/pull/21925 From fyang at openjdk.org Fri Nov 8 15:48:26 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 8 Nov 2024 15:48:26 GMT Subject: RFR: 8343774: compiler/c2/TestCastX2NotProcessedIGVN.java fails on ppc64(le) & s390x platforms [v2] In-Reply-To: References: Message-ID: <--JLrmhyB78xAe6PkT73i-CdbWyJiXAIShdz7Qh_OTE=.3914a796-ae28-4d35-982f-0e2e3cbef663@github.com> On Fri, 8 Nov 2024 14:53:25 GMT, Richard Reingruber wrote: >> Vectorization of the loop in `test2` does not work on ppc therefore I want to exclude it there. > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Positive list for test2 test/hotspot/jtreg/compiler/c2/TestCastX2NotProcessedIGVN.java line 66: > 64: @Test > 65: @IR(counts = {IRNode.LOAD_VECTOR_I, "> 1"}, > 66: applyIfPlatformOr = {"x64", "true", "aarch64", "true", "riscv64", "true"}) Hi, Could you please remove `riscv64` from this line? I just found that this test also fails when testing on riscv64 platforms where the vector extension is not available. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21975#discussion_r1834607797 From tholenstein at openjdk.org Fri Nov 8 15:57:28 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 8 Nov 2024 15:57:28 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v9] In-Reply-To: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: <5P3xMWErZk3AumFMcgTpnUtQkGhJDbKk9H18YsoPRGQ=.26f408a2-ead1-4e88-a9df-96bc5f2280fe@github.com> > color > > pick > > nodes > > Adds new option to IGV to color selected nodes: > 1) select some nodes > 2) `Ctrl + C` or `View` -> `Color action` > 3) pick a color and apply Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: set font only for ColorChooser ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21925/files - new: https://git.openjdk.org/jdk/pull/21925/files/c9af2285..5cf1e8b4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=07-08 Stats: 16 lines in 1 file changed: 1 ins; 13 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21925.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21925/head:pull/21925 PR: https://git.openjdk.org/jdk/pull/21925 From rrich at openjdk.org Fri Nov 8 15:57:37 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 8 Nov 2024 15:57:37 GMT Subject: RFR: 8343774: compiler/c2/TestCastX2NotProcessedIGVN.java fails on ppc64(le) & s390x platforms [v3] In-Reply-To: References: Message-ID: > Vectorization of the loop in `test2` does not work on ppc therefore I want to exclude it there. Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Remove riscv64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21975/files - new: https://git.openjdk.org/jdk/pull/21975/files/a7c2872b..82ac4751 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21975&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21975&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21975.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21975/head:pull/21975 PR: https://git.openjdk.org/jdk/pull/21975 From rrich at openjdk.org Fri Nov 8 15:57:39 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 8 Nov 2024 15:57:39 GMT Subject: RFR: 8343774: compiler/c2/TestCastX2NotProcessedIGVN.java fails on ppc64(le) & s390x platforms [v2] In-Reply-To: <--JLrmhyB78xAe6PkT73i-CdbWyJiXAIShdz7Qh_OTE=.3914a796-ae28-4d35-982f-0e2e3cbef663@github.com> References: <--JLrmhyB78xAe6PkT73i-CdbWyJiXAIShdz7Qh_OTE=.3914a796-ae28-4d35-982f-0e2e3cbef663@github.com> Message-ID: On Fri, 8 Nov 2024 15:44:46 GMT, Fei Yang wrote: >> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: >> >> Positive list for test2 > > test/hotspot/jtreg/compiler/c2/TestCastX2NotProcessedIGVN.java line 66: > >> 64: @Test >> 65: @IR(counts = {IRNode.LOAD_VECTOR_I, "> 1"}, >> 66: applyIfPlatformOr = {"x64", "true", "aarch64", "true", "riscv64", "true"}) > > Hi, Could you please remove `riscv64` from this line? I just found that this test also fails when testing on riscv64 platforms where the vector extension is not available. Thanks. Sure. I've removed it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21975#discussion_r1834622612 From kvn at openjdk.org Fri Nov 8 16:23:31 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 8 Nov 2024 16:23:31 GMT Subject: RFR: 8343488: Test VectorRebracket128Test.java can't exclude by test/hotspot/jtreg/ProblemList.txt In-Reply-To: References: Message-ID: <0S9oX4sARVGPJZHQ_RvQcKWXwFEIt_W--uSruO2wKF8=.359ff9de-b7c8-42c2-acc6-a106b416d386@github.com> On Fri, 8 Nov 2024 06:39:24 GMT, SendaoYan wrote: > Hi all, > The test `test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java` can't exclude from `test/hotspot/jtreg/ProblemList.txt` correctly. The test only contains a single test, so it do not need to set test suffix. > This PR remove the test suffix to make the Problemlist work normally, trivial fix, no risk. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21968#pullrequestreview-2424243443 From kvn at openjdk.org Fri Nov 8 16:23:34 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 8 Nov 2024 16:23:34 GMT Subject: RFR: 8343745: Only update Last Value Assertion Predicates in Loop Unrolling In-Reply-To: References: Message-ID: <7msdSYuP6w8tSJ-GXs2riNuwRFhLIXdIWxa8UiLXWXw=.b17b2cb1-994d-4540-aa9d-b808e6522088@github.com> On Fri, 8 Nov 2024 07:12:12 GMT, Christian Hagedorn wrote: > (Note: This is a dependent PR on https://github.com/openjdk/jdk/pull/21944 which is fully reviewed but I'd prefer to integrate it on Monday and run again some testing over the weekend) > > This is a follow-up feature to https://github.com/openjdk/jdk/pull/21944 which updated Loop Unrolling to use a new predicate visitor and enables this patch. > > In Loop Unrolling, we only update the stride and not the init value of a loop. Thus, we actually only require to update the Last Value Assertion Predicates because the Init Value Assertion Predicates do not use `OpaqueLoopStride`. So, we also would not be required to kill the old Init Value Initialized Assertion Predicates. This patch implements that improvement. > > To make this work, we need to query the associated `AssertionPredicateType` of an Assertion Predicate which is stored in the `If/RangeCheckNode` (I guess it's okay to have this additional node field in product). This was guarded by `NOT_PRODUCT` before. For this patch, I make this information available in product builds and use it to implement this feature. > > Thanks, > Christian Looks fine to me. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21969#pullrequestreview-2424253334 From tschatzl at openjdk.org Fri Nov 8 16:46:43 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 8 Nov 2024 16:46:43 GMT Subject: RFR: 8343824: Remove unused InstructionFlags in C1 Message-ID: <_4kRSWzc_WTe7oVsSWZZh0n6s3aRBF8sHS2rxVD50jA=.b63db74f-da56-4ecf-aef5-ef8ce8a5a198@github.com> Hi all, please review this change that removes unused/unreferenced flags from the `Instruction::InstructionFlag` enum for the C1 compiler. Testing: gha, tier1-3 Thanks, Thomas ------------- Commit messages: - 8343824 Changes: https://git.openjdk.org/jdk/pull/21973/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21973&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343824 Stats: 10 lines in 1 file changed: 0 ins; 10 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21973.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21973/head:pull/21973 PR: https://git.openjdk.org/jdk/pull/21973 From kvn at openjdk.org Fri Nov 8 17:47:11 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 8 Nov 2024 17:47:11 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v4] In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 14:55:53 GMT, theoweidmannoracle wrote: >> This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: >> >> >> ConINode* node = _igvn.intcon(i); >> set_ctrl(node, C->root()); >> >> >> and >> >> >> ConLNode* node = _igvn.longcon(i); >> set_ctrl(node, C->root()); >> >> >> Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Add set_root_as_ctrl Looks good. I have few additional comments. src/hotspot/share/opto/loopnode.cpp line 3147: > 3145: ConINode* zero = igvn->intcon(0); > 3146: if (iloop != nullptr) { > 3147: iloop->set_root_as_ctrl(zero); Please look on history of this code. This is suspicious - constant nodes should be always attached to Root. src/hotspot/share/opto/loopnode.hpp line 996: > 994: } > 995: void set_root_as_ctrl(Node* n) { > 996: assert( !has_node(n) || has_ctrl(n), "" ); We don't use spaces after and before `()` in assert(). Ignore old style in previous lines. src/hotspot/share/opto/loopopts.cpp line 195: > 193: set_root_as_ctrl(x); > 194: continue; > 195: } This looks like "band-aid" - this should be assert. May be investigate in separate RFE. ------------- PR Review: https://git.openjdk.org/jdk/pull/21836#pullrequestreview-2424513220 PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1834825217 PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1834821279 PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1834827466 From kvn at openjdk.org Fri Nov 8 17:50:29 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 8 Nov 2024 17:50:29 GMT Subject: RFR: 8343824: Remove unused InstructionFlags in C1 In-Reply-To: <_4kRSWzc_WTe7oVsSWZZh0n6s3aRBF8sHS2rxVD50jA=.b63db74f-da56-4ecf-aef5-ef8ce8a5a198@github.com> References: <_4kRSWzc_WTe7oVsSWZZh0n6s3aRBF8sHS2rxVD50jA=.b63db74f-da56-4ecf-aef5-ef8ce8a5a198@github.com> Message-ID: On Fri, 8 Nov 2024 11:09:27 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that removes unused/unreferenced flags from the `Instruction::InstructionFlag` enum for the C1 compiler. > > Testing: gha, tier1-3 > > Thanks, > Thomas @tschatzl do you know history of these flags and why they are not used? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21973#issuecomment-2465420995 From rehn at openjdk.org Fri Nov 8 18:58:30 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 8 Nov 2024 18:58:30 GMT Subject: RFR: 8343555: RISC-V: make some verified (on hardware) extension options diagnostic [v2] In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 18:42:04 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this simple patch? >> Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. >> Thanks > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > turn more verified extensions as DIAGNOSTIC Sure, thanks. ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21885#pullrequestreview-2424660232 From acobbs at openjdk.org Fri Nov 8 19:06:58 2024 From: acobbs at openjdk.org (Archie Cobbs) Date: Fri, 8 Nov 2024 19:06:58 GMT Subject: RFR: 8343479: Remove unnecessary @SuppressWarnings annotations (hotspot) [v3] In-Reply-To: References: Message-ID: > Please review this patch which removes unnecessary `@SuppressWarnings` annotations. Archie Cobbs has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into SuppressWarningsCleanup-hotspot - Update copyright years. - Merge branch 'master' into SuppressWarningsCleanup-hotspot - Merge branch 'master' into SuppressWarningsCleanup-graal - Remove unnecessary @SuppressWarnings annotations. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21853/files - new: https://git.openjdk.org/jdk/pull/21853/files/21c83e93..a574dda6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21853&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21853&range=01-02 Stats: 131587 lines in 749 files changed: 103986 ins; 9680 del; 17921 mod Patch: https://git.openjdk.org/jdk/pull/21853.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21853/head:pull/21853 PR: https://git.openjdk.org/jdk/pull/21853 From tschatzl at openjdk.org Fri Nov 8 19:52:59 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 8 Nov 2024 19:52:59 GMT Subject: RFR: 8343824: Remove unused InstructionFlags in C1 In-Reply-To: <_4kRSWzc_WTe7oVsSWZZh0n6s3aRBF8sHS2rxVD50jA=.b63db74f-da56-4ecf-aef5-ef8ce8a5a198@github.com> References: <_4kRSWzc_WTe7oVsSWZZh0n6s3aRBF8sHS2rxVD50jA=.b63db74f-da56-4ecf-aef5-ef8ce8a5a198@github.com> Message-ID: On Fri, 8 Nov 2024 11:09:27 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that removes unused/unreferenced flags from the `Instruction::InstructionFlag` enum for the C1 compiler. > > Testing: gha, tier1-3 > > Thanks, > Thomas Fwiw, the GHA failures are infrastructure issues, some dependencies could not be installed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21973#issuecomment-2465635907 From tschatzl at openjdk.org Fri Nov 8 19:52:59 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 8 Nov 2024 19:52:59 GMT Subject: RFR: 8343824: Remove unused InstructionFlags in C1 In-Reply-To: References: <_4kRSWzc_WTe7oVsSWZZh0n6s3aRBF8sHS2rxVD50jA=.b63db74f-da56-4ecf-aef5-ef8ce8a5a198@github.com> Message-ID: On Fri, 8 Nov 2024 17:46:49 GMT, Vladimir Kozlov wrote: >> Hi all, >> >> please review this change that removes unused/unreferenced flags from the `Instruction::InstructionFlag` enum for the C1 compiler. >> >> Testing: gha, tier1-3 >> >> Thanks, >> Thomas > > @tschatzl do you know history of these flags and why they are not used? @vnkozlov: no and no - I am just starting looking at the C1 compiler to implement frequency based generation of post-write barrier filters (i.e. add the counters for later C2 compilation) as a follow-up to the post-write barrier changes. I only noticed that they were unused; looking back right now they are unused since at least JDK7u. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21973#issuecomment-2465632718 From acobbs at openjdk.org Fri Nov 8 19:59:44 2024 From: acobbs at openjdk.org (Archie Cobbs) Date: Fri, 8 Nov 2024 19:59:44 GMT Subject: RFR: 8343479: Remove unnecessary @SuppressWarnings annotations (hotspot) [v2] In-Reply-To: References: <3xJg8mwE5kmAA_DfVquqRuI9nbrHHTfv-kdePt_LF5E=.79702bef-f612-4914-b3ee-03a6c0ea306f@github.com> Message-ID: On Thu, 7 Nov 2024 15:43:45 GMT, Archie Cobbs wrote: > but all of the ones I've checked appear to be ... Correction - there is actually one case that revealed a compiler bug: [JDK-8343286](https://bugs.openjdk.org/browse/JDK-8343286). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21853#issuecomment-2465649956 From kvn at openjdk.org Fri Nov 8 20:24:19 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 8 Nov 2024 20:24:19 GMT Subject: RFR: 8343824: Remove unused InstructionFlags in C1 In-Reply-To: <_4kRSWzc_WTe7oVsSWZZh0n6s3aRBF8sHS2rxVD50jA=.b63db74f-da56-4ecf-aef5-ef8ce8a5a198@github.com> References: <_4kRSWzc_WTe7oVsSWZZh0n6s3aRBF8sHS2rxVD50jA=.b63db74f-da56-4ecf-aef5-ef8ce8a5a198@github.com> Message-ID: On Fri, 8 Nov 2024 11:09:27 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that removes unused/unreferenced flags from the `Instruction::InstructionFlag` enum for the C1 compiler. > > Testing: gha, tier1-3 > > Thanks, > Thomas Good. Yes, it looks like leftover from JDK 6 development. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21973#pullrequestreview-2424858930 From vlivanov at openjdk.org Fri Nov 8 20:28:07 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 8 Nov 2024 20:28:07 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 8 Nov 2024 08:15:32 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ instruction for following IR pallets. >> >> >> MulVL ( AndV SRC1, 0xFFFFFFFF) ( AndV SRC2, 0xFFFFFFFF) >> MulVL (URShiftVL SRC1 , 32) (URShiftVL SRC2, 32) >> MulVL (URShiftVL SRC1 , 32) ( AndV SRC2, 0xFFFFFFFF) >> MulVL ( AndV SRC1, 0xFFFFFFFF) (URShiftVL SRC2 , 32) >> MulVL (VectorCastI2X SRC1) (VectorCastI2X SRC2) >> MulVL (RShiftVL SRC1 , 32) (RShiftVL SRC2, 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMUL[U]DQ instruction performs [unsigned] multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimizati... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Creating specialized IR to shield pattern from subsequent transforms in optimization pipeline In the latest version you added new Ideal nodes (`MulIVL` and `MulUIVL`). I don't see a compelling reason to do so. IMO matcher functionality is more than enough to cover `VPMULDQ` case. `MulIVL` is equivalent to `MulVL` + `has_int_inputs()` predicate. For `MulUIVL` you additionally do input rewiring (using `forward_masked_input`), but (1) `AndV src (Replicate 0xFFFFFFFF))` operands can be easily detected on matcher side (with an extra AD instruction); and (2) such optimization is limited because it is valid only for `0xFFFFFFFF` case while `has_uint_inputs() == true` for `C <= 0xFFFFFFFF`. So, IMO `MulIVL` and `MulUIVL` nodes just add noise in Ideal graph without improving situation during matching. src/hotspot/share/opto/vectornode.cpp line 2132: > 2130: // Directly forward masked inputs if > 2131: if (n->Opcode() == Op_AndV) { > 2132: return n->in(1)->Opcode() == Op_Replicate ? n->in(2) : n->in(1); This particular check should ensure that Replicate constant is `0xFFFFFFFF`. ------------- PR Review: https://git.openjdk.org/jdk/pull/21244#pullrequestreview-2424864897 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1835023354 From dlong at openjdk.org Fri Nov 8 20:53:15 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 8 Nov 2024 20:53:15 GMT Subject: RFR: 8343824: Remove unused InstructionFlags in C1 In-Reply-To: <_4kRSWzc_WTe7oVsSWZZh0n6s3aRBF8sHS2rxVD50jA=.b63db74f-da56-4ecf-aef5-ef8ce8a5a198@github.com> References: <_4kRSWzc_WTe7oVsSWZZh0n6s3aRBF8sHS2rxVD50jA=.b63db74f-da56-4ecf-aef5-ef8ce8a5a198@github.com> Message-ID: <4PVt1yp3fEDDRWyMYSmwOfS6N4zWQHnjbUqyxekI1Ac=.479c7d6a-de00-4179-a03a-69c57f9b8159@github.com> On Fri, 8 Nov 2024 11:09:27 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that removes unused/unreferenced flags from the `Instruction::InstructionFlag` enum for the C1 compiler. > > Testing: gha, tier1-3 > > Thanks, > Thomas Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21973#pullrequestreview-2424897413 From tholenstein at openjdk.org Fri Nov 8 22:38:35 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 8 Nov 2024 22:38:35 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v10] In-Reply-To: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: > color > > pick > > nodes > > Adds new option to IGV to color selected nodes: > 1) select some nodes > 2) `Ctrl + C` or `View` -> `Color action` > 3) pick a color and apply Tobias Holenstein has updated the pull request incrementally with two additional commits since the last revision: - Create a panel for color chooser and apply the LAF to it - save location ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21925/files - new: https://git.openjdk.org/jdk/pull/21925/files/5cf1e8b4..46024d07 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=08-09 Stats: 27 lines in 1 file changed: 16 ins; 3 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/21925.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21925/head:pull/21925 PR: https://git.openjdk.org/jdk/pull/21925 From dlong at openjdk.org Fri Nov 8 23:09:19 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 8 Nov 2024 23:09:19 GMT Subject: RFR: 8343824: Remove unused InstructionFlags in C1 In-Reply-To: <_4kRSWzc_WTe7oVsSWZZh0n6s3aRBF8sHS2rxVD50jA=.b63db74f-da56-4ecf-aef5-ef8ce8a5a198@github.com> References: <_4kRSWzc_WTe7oVsSWZZh0n6s3aRBF8sHS2rxVD50jA=.b63db74f-da56-4ecf-aef5-ef8ce8a5a198@github.com> Message-ID: <8X6_A2Urx4zdtPDcHxFowwn14TMxNF2LxvfGq8-8dh4=.4439fd66-9db3-4f7a-897d-11b70281b050@github.com> On Fri, 8 Nov 2024 11:09:27 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that removes unused/unreferenced flags from the `Instruction::InstructionFlag` enum for the C1 compiler. > > Testing: gha, tier1-3 > > Thanks, > Thomas SCCS archeology reveals that these 3 were converted from boolean fields by JDK-4649182: bool _needs_write_barrier; bool _needs_store_check; bool _is_eliminated; // Set by store elimination InWorkListFlag was later added by JDK-7153771. As far as I can tell, the only one that was ever used is _is_eliminated/IsEliminatedFlag, which seems to have gone away between jdk5 and jdk6. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21973#issuecomment-2465880763 From sviswanathan at openjdk.org Fri Nov 8 23:20:29 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 8 Nov 2024 23:20:29 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v2] In-Reply-To: References: Message-ID: On Sun, 6 Oct 2024 10:24:53 GMT, Quan Anh Mai wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: >> >> [vectorapi] Refactor VectorShuffle implementation > > I have adapted the patch in accordance with https://github.com/openjdk/jdk/pull/20634, I moved the index wrapping into C2 instead of making it a separate step as I think it seems clearer. Also, I think in the future we can eliminate this step so putting it in C2 would make the progress easier. > > Please take a look, thanks a lot. @merykitty Could you please merge with the latest and resolve conflicts? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2465889165 From sviswanathan at openjdk.org Fri Nov 8 23:21:30 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 8 Nov 2024 23:21:30 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 8 Nov 2024 20:25:10 GMT, Vladimir Ivanov wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Creating specialized IR to shield pattern from subsequent transforms in optimization pipeline > > src/hotspot/share/opto/vectornode.cpp line 2132: > >> 2130: // Directly forward masked inputs if >> 2131: if (n->Opcode() == Op_AndV) { >> 2132: return n->in(1)->Opcode() == Op_Replicate ? n->in(2) : n->in(1); > > This particular check should ensure that Replicate constant is `0xFFFFFFFF`. Yes, this should ensure 0xFFFFFFFF. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1835148834 From fyang at openjdk.org Sat Nov 9 01:13:54 2024 From: fyang at openjdk.org (Fei Yang) Date: Sat, 9 Nov 2024 01:13:54 GMT Subject: RFR: 8343774: compiler/c2/TestCastX2NotProcessedIGVN.java fails on ppc64(le) & s390x platforms [v3] In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 15:57:37 GMT, Richard Reingruber wrote: >> Vectorization of the loop in `test2` does not work on ppc therefore I want to exclude it there. > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Remove riscv64 LGTM. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21975#pullrequestreview-2425114315 From dlong at openjdk.org Sat Nov 9 01:55:12 2024 From: dlong at openjdk.org (Dean Long) Date: Sat, 9 Nov 2024 01:55:12 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 10:19:51 GMT, theoweidmannoracle wrote: > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. Nevermind about print_inlining_commit() doing an append -- that is apparently the intended behavior. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2465981620 From swen at openjdk.org Sat Nov 9 02:36:14 2024 From: swen at openjdk.org (Shaojin Wen) Date: Sat, 9 Nov 2024 02:36:14 GMT Subject: RFR: 8343629: More MergeStore benchmark [v2] In-Reply-To: References: Message-ID: > 1. Added the putBytes4 benchmark, which corresponds to StringBuilder appendNull > 2. Optimized the putChars4/setInt/setLong series of benchmarks to reduce extra overhead and more accurately reflect performance differences. Shaojin Wen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - from @eme64 add MergeStoresDisabled - Merge remote-tracking branch 'upstream/master' into merge_store_bench_202410 - Merge remote-tracking branch 'upstream/master' into merge_store_bench_202410 - Merge branch 'master' into merge_store_bench_202410 - add putBytes4 and improved put ------------- Changes: https://git.openjdk.org/jdk/pull/21659/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21659&range=01 Stats: 320 lines in 1 file changed: 76 ins; 51 del; 193 mod Patch: https://git.openjdk.org/jdk/pull/21659.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21659/head:pull/21659 PR: https://git.openjdk.org/jdk/pull/21659 From amitkumar at openjdk.org Sat Nov 9 03:03:42 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 9 Nov 2024 03:03:42 GMT Subject: RFR: 8343774: compiler/c2/TestCastX2NotProcessedIGVN.java fails on ppc64(le) & s390x platforms [v3] In-Reply-To: References: Message-ID: <6XRfvjzDnmSSAWIGJ7vU1M0bP2_YZRsq1gbtbNr3hyk=.9cf5b9b8-55fe-47cd-b61f-93d6332a247d@github.com> On Fri, 8 Nov 2024 15:57:37 GMT, Richard Reingruber wrote: >> Vectorization of the loop in `test2` does not work on ppc therefore I want to exclude it there. > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Remove riscv64 Marked as reviewed by amitkumar (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21975#pullrequestreview-2425172328 From amitkumar at openjdk.org Sat Nov 9 03:45:30 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 9 Nov 2024 03:45:30 GMT Subject: RFR: 8343774: compiler/c2/TestCastX2NotProcessedIGVN.java fails on ppc64(le) & s390x platforms In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 14:49:09 GMT, Richard Reingruber wrote: >> Thanks for taking care of that. >> Maybe it would be more robust to only enable the test on x86, aarch64 and riscv64. >> The whole test doesn't need to excluded actually. Only IR matching on `test2` needs to be disabled. This can be done with: >> >> applyIfPlatformOr = {"x64", "true", "aarch64", "true", "riscv64", "true"}) >> >> See `TestBoolNodeGVN.java` for instance. > >> Thanks for taking care of that. Maybe it would be more robust to only enable the test on x86, aarch64 and riscv64. The whole test doesn't need to excluded actually. Only IR matching on `test2` needs to be disabled. This can be done with: >> >> ``` >> applyIfPlatformOr = {"x64", "true", "aarch64", "true", "riscv64", "true"}) >> ``` >> >> See `TestBoolNodeGVN.java` for instance. > > Ok. I've done that. @reinrich Sorry for creating mess here. Yesterday, this test failed while testing changes for JEP 450 related to compact headers. However, now I checked and head stream testing job shows that it does not fail with `jdk-head`; I have verified and It fails only on s390x when I enable UseCompactObjectHeaders: `make test TEST=jtreg:$(find . -name TestCastX2NotProcessedIGVN.java) JTREG="VM_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders"`. While this issue could potentially occur with other settings, `UseCompactObjectHeaders` is the only one I have observed causing this failure. Do you suggest disabling this, or is separate debugging required to investigate this behaviour?" ------------- PR Comment: https://git.openjdk.org/jdk/pull/21975#issuecomment-2466026795 From swen at openjdk.org Sat Nov 9 03:55:37 2024 From: swen at openjdk.org (Shaojin Wen) Date: Sat, 9 Nov 2024 03:55:37 GMT Subject: RFR: 8343629: More MergeStore benchmark In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 07:22:40 GMT, Emanuel Peter wrote: >> 1. Added the putBytes4 benchmark, which corresponds to StringBuilder appendNull >> 2. Optimized the putChars4/setInt/setLong series of benchmarks to reduce extra overhead and more accurately reflect performance differences. > > You can find an example of how to do that easily here: > https://github.com/openjdk/jdk/pull/19970/files#diff-9072c369f5b541ef9fca3ad8320aa59e88cc72f203c03da58100b1d111ffc324R746-R749 @eme64 Why is there no noticeable difference in the performance of `+/-MergeStores` | | -MergeStores | +MergeStores | delta | | --- | --- | --- | --- | | getCharB | 5900.246 | 5902.316 | -0.04% | | getCharBU | 4865.881 | 4866.630 | -0.02% | | getCharBV | 3084.194 | 3078.657 | 0.18% | | getCharC | 2233.422 | 2232.788 | 0.03% | | getCharL | 6032.213 | 6028.447 | 0.06% | | getCharLU | 4492.928 | 4482.773 | 0.23% | | getCharLV | 2220.004 | 2220.231 | -0.01% | | getIntB | 7996.907 | 8050.658 | -0.67% | | getIntBU | 9041.783 | 9035.892 | 0.07% | | getIntBV | 309.469 | 308.076 | 0.45% | | getIntL | 7887.687 | 7881.362 | 0.08% | | getIntLU | 8856.416 | 8863.707 | -0.08% | | getIntLV | 2225.803 | 2225.789 | 0.00% | | getIntRB | 8619.974 | 8616.985 | 0.03% | | getIntRBU | 11098.237 | 11100.091 | -0.02% | | getIntRL | 8959.808 | 8958.688 | 0.01% | | getIntRLU | 9237.407 | 9236.465 | 0.01% | | getIntRU | 2502.967 | 2503.585 | -0.02% | | getIntU | 2492.784 | 2492.675 | 0.00% | | getLongB | 24807.583 | 24797.555 | 0.04% | | getLongBU | 14022.093 | 14008.556 | 0.10% | | getLongBV | 601.878 | 600.904 | 0.16% | | getLongL | 25076.552 | 25111.661 | -0.14% | | getLongLU | 14470.997 | 14474.230 | -0.02% | | getLongLV | 2223.678 | 2223.882 | -0.01% | | getLongRB | 24769.555 | 24778.684 | -0.04% | | getLongRBU | 14017.091 | 14024.421 | -0.05% | | getLongRL | 25070.811 | 25085.936 | -0.06% | | getLongRLU | 14462.097 | 14467.410 | -0.04% | | getLongRU | 3056.826 | 3056.270 | 0.02% | | getLongU | 3045.057 | 3045.650 | -0.02% | | putBytes4 | 928.032 | 928.111 | -0.01% | | putBytes4GetBytes | 5876.794 | 5875.995 | 0.01% | | putBytes4U | 926.596 | 928.596 | -0.22% | | putBytes4X | 927.929 | 927.928 | 0.00% | | putChars4B | 5635.803 | 5635.872 | 0.00% | | putChars4BU | 1142.948 | 1141.809 | 0.10% | | putChars4BV | 4482.613 | 4480.597 | 0.04% | | putChars4C | 1132.133 | 1132.881 | -0.07% | | putChars4L | 5640.644 | 5632.055 | 0.15% | | putChars4LU | 1141.009 | 1142.132 | -0.10% | | putChars4LV | 1133.833 | 1133.137 | 0.06% | | putChars4S | 1132.469 | 1132.250 | 0.02% | | setCharBS | 6080.539 | 6081.117 | -0.01% | | setCharBV | 3598.374 | 3591.190 | 0.20% | | setCharC | 4497.279 | 4544.706 | -1.04% | | setCharLS | 5615.475 | 5620.162 | -0.08% | | setCharLV | 2249.104 | 2245.083 | 0.18% | | setIntB | 7999.139 | 8030.850 | -0.39% | | setIntBU | 17922.810 | 17942.929 | -0.11% | | setIntBV | 3237.265 | 3224.414 | 0.40% | | setIntL | 2124.492 | 2109.906 | 0.69% | | setIntLU | 4772.256 | 4801.314 | -0.61% | | setIntLV | 2110.382 | 2120.022 | -0.45% | | setIntRB | 13773.518 | 13775.889 | -0.02% | | setIntRBU | 14752.651 | 14754.926 | -0.02% | | setIntRL | 3226.597 | 3227.019 | -0.01% | | setIntRLU | 5862.400 | 5882.564 | -0.34% | | setIntRU | 5915.139 | 5917.139 | -0.03% | | setIntU | 4794.627 | 4780.927 | 0.29% | | setLongB | 31661.626 | 31598.635 | 0.20% | | setLongBU | 25681.380 | 25622.835 | 0.23% | | setLongBV | 2167.426 | 2164.900 | 0.12% | | setLongL | 5380.433 | 5321.645 | 1.10% | | setLongLU | 4281.526 | 4280.263 | 0.03% | | setLongLV | 2109.982 | 2110.138 | -0.01% | | setLongRB | 29807.728 | 29826.089 | -0.06% | | setLongRBU | 24973.926 | 24903.052 | 0.28% | | setLongRL | 4518.310 | 4518.594 | -0.01% | | setLongRLU | 4792.258 | 4795.612 | -0.07% | | setLongRU | 4796.491 | 4792.139 | 0.09% | | setLongU | 4280.624 | 4507.839 | -5.04% | ------------- PR Comment: https://git.openjdk.org/jdk/pull/21659#issuecomment-2466029503 From syan at openjdk.org Sat Nov 9 11:41:03 2024 From: syan at openjdk.org (SendaoYan) Date: Sat, 9 Nov 2024 11:41:03 GMT Subject: RFR: 8343763: Aarch64: Gtest codestrings.validate_vm intermittent fails extra addr Message-ID: 8343763: Aarch64: Gtest codestrings.validate_vm intermittent fails extra addr ------------- Commit messages: - support deal with "cbnz\tx0, Stub::_large_arrays_hashcode_short" - support deal with "adrp\tx0 = mnaddF_reg_regNode::pipeline_class()" - support deal with b\tStub::indexof_linear_ul - fix the var name bugs - deal with ": cbnz\tx16, Stub:: " difference and ": adrp\tx16, = TemplateInterpreterGenerator::generate_CRC32_update_entry()+32" difference - 8343763: Aarch64: Gtest codestrings.validate_vm intermittent fails extra addr Changes: https://git.openjdk.org/jdk/pull/21955/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21955&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343763 Stats: 10 lines in 1 file changed: 5 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/21955.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21955/head:pull/21955 PR: https://git.openjdk.org/jdk/pull/21955 From syan at openjdk.org Sat Nov 9 12:14:42 2024 From: syan at openjdk.org (SendaoYan) Date: Sat, 9 Nov 2024 12:14:42 GMT Subject: RFR: 8343763: Aarch64: Gtest codestrings.validate_vm intermittent fails extra addr In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 13:43:10 GMT, SendaoYan wrote: > Hi all, > The `Gtest codestrings.validate_vm` intermittent fails with different disassembly symbol name, such as different symbol name with instruction `adrp`/`b` etc. I think the difference of symbol name is acceptable, this PR remove the releated symbol name to make the fragile disassemble identical compare more robustness. > The change has been verified locally, the gtest test run with 20k times all passed, except sometimes the subtest `ThreadsListHandle::sanity_vm` intermittent fails which has been recorded by [JDK-8315141](https://bugs.openjdk.org/browse/JDK-8315141). Test-fix only, no risk. GHA report two failures, the fails seems like environmental issue, it's unreleated to this PR. 1. macos-aarch64 jdk/tier1 part1 at `install dependencied` stage fails `invalid developer directory` 2. macos-aarch64 hs/tier1 runtime at `install dependencied` stage fails `invalid developer directory` ------------- PR Comment: https://git.openjdk.org/jdk/pull/21955#issuecomment-2466191838 From jbhateja at openjdk.org Sun Nov 10 07:43:48 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 10 Nov 2024 07:43:48 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 8 Nov 2024 20:25:23 GMT, Vladimir Ivanov wrote: > In the latest version you added new Ideal nodes (`MulIVL` and `MulUIVL`). I don't see a compelling reason to do so. IMO matcher functionality is more than enough to cover `VPMULDQ` case. `MulIVL` is equivalent to `MulVL` + `has_int_inputs()` predicate. For `MulUIVL` you additionally do input rewiring (using `forward_masked_input`), but (1) `AndV src (Replicate 0xFFFFFFFF))` operands can be easily detected on matcher side (with an extra AD instruction); and (2) such optimization is limited because it is valid only for `0xFFFFFFFF` case while `has_uint_inputs() == true` for `C <= 0xFFFFFFFF`. > > So, IMO `MulIVL` and `MulUIVL` nodes just add noise in Ideal graph without improving situation during matching. Hi Vladimir, Problem occurs if AndV gets shared, in such case matcher will not be able to absorb the masking pattern. Specialized IR overrules any such limitations and shields pattern it represents from downstream optimizations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2466624605 From jbhateja at openjdk.org Sun Nov 10 07:43:49 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 10 Nov 2024 07:43:49 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 8 Nov 2024 23:18:18 GMT, Sandhya Viswanathan wrote: >> src/hotspot/share/opto/vectornode.cpp line 2132: >> >>> 2130: // Directly forward masked inputs if >>> 2131: if (n->Opcode() == Op_AndV) { >>> 2132: return n->in(1)->Opcode() == Op_Replicate ? n->in(2) : n->in(1); >> >> This particular check should ensure that Replicate constant is `0xFFFFFFFF`. > > Yes, this should ensure 0xFFFFFFFF. We land here after checking if inputs are uints. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1835611481 From jbhateja at openjdk.org Sun Nov 10 08:22:12 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 10 Nov 2024 08:22:12 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v3] In-Reply-To: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: > This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ instruction for following IR pallets. > > > MulVL ( AndV SRC1, 0xFFFFFFFF) ( AndV SRC2, 0xFFFFFFFF) > MulVL (URShiftVL SRC1 , 32) (URShiftVL SRC2, 32) > MulVL (URShiftVL SRC1 , 32) ( AndV SRC2, 0xFFFFFFFF) > MulVL ( AndV SRC1, 0xFFFFFFFF) (URShiftVL SRC2 , 32) > MulVL (VectorCastI2X SRC1) (VectorCastI2X SRC2) > MulVL (RShiftVL SRC1 , 32) (RShiftVL SRC2, 32) > > > > A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. > > If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. > > VPMUL[U]DQ instruction performs [unsigned] multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. > > Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- > > > Sierra Forest :- > ============ > Baseline:- > Benchmark (SIZE) Mode Cnt Score Error Units > VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms > > With Optimization:- > Benchmark (SIZE) Mode ... Jatin Bhateja has updated the pull request incrementally with two additional commits since the last revision: - Refining comment - Review resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21244/files - new: https://git.openjdk.org/jdk/pull/21244/files/613f491b..eba586b5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21244&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21244&range=01-02 Stats: 17 lines in 2 files changed: 8 ins; 7 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21244.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21244/head:pull/21244 PR: https://git.openjdk.org/jdk/pull/21244 From chagedorn at openjdk.org Mon Nov 11 06:21:18 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 11 Nov 2024 06:21:18 GMT Subject: RFR: 8342946: Replace predicate walking code in Loop Unrolling with a predicate visitor In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 09:22:02 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and Post Loop (integrated with https://github.com/openjdk/jdk/pull/21790) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): `PhaseIdealLoop::get_assertion_predicates()` used for Loop Unswitching and removing useless Template Assertion Predicate (integrated with https://github.com/openjdk/jdk/pull/21918) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (this PR) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679 / https://github.com/openjdk/jdk/pull/21790 / https://github.com/openjdk/jdk/pull/21918) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. T... Thanks Vladimir for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21944#issuecomment-2467329571 From chagedorn at openjdk.org Mon Nov 11 06:21:18 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 11 Nov 2024 06:21:18 GMT Subject: Integrated: 8342946: Replace predicate walking code in Loop Unrolling with a predicate visitor In-Reply-To: References: Message-ID: <2kqGwmXSFfj0wWGRIdXjk9WHG94L4CKAzYPKdi7AtuI=.f1d7623d-84ec-46dd-9dae-cc68ee13b8ee@github.com> On Thu, 7 Nov 2024 09:22:02 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and Post Loop (integrated with https://github.com/openjdk/jdk/pull/21790) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): `PhaseIdealLoop::get_assertion_predicates()` used for Loop Unswitching and removing useless Template Assertion Predicate (integrated with https://github.com/openjdk/jdk/pull/21918) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (this PR) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679 / https://github.com/openjdk/jdk/pull/21790 / https://github.com/openjdk/jdk/pull/21918) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. T... This pull request has now been integrated. Changeset: 5f338e9a Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/5f338e9adbcf7fe7ee90abfd34a24a3a93c22211 Stats: 203 lines in 4 files changed: 161 ins; 29 del; 13 mod 8342946: Replace predicate walking code in Loop Unrolling with a predicate visitor Reviewed-by: roland, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21944 From chagedorn at openjdk.org Mon Nov 11 06:26:28 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 11 Nov 2024 06:26:28 GMT Subject: RFR: 8343745: Only update Last Value Assertion Predicates in Loop Unrolling [v2] In-Reply-To: References: Message-ID: > (Note: This is a dependent PR on https://github.com/openjdk/jdk/pull/21944 which is fully reviewed but I'd prefer to integrate it on Monday and run again some testing over the weekend) > > This is a follow-up feature to https://github.com/openjdk/jdk/pull/21944 which updated Loop Unrolling to use a new predicate visitor and enables this patch. > > In Loop Unrolling, we only update the stride and not the init value of a loop. Thus, we actually only require to update the Last Value Assertion Predicates because the Init Value Assertion Predicates do not use `OpaqueLoopStride`. So, we also would not be required to kill the old Init Value Initialized Assertion Predicates. This patch implements that improvement. > > To make this work, we need to query the associated `AssertionPredicateType` of an Assertion Predicate which is stored in the `If/RangeCheckNode` (I guess it's okay to have this additional node field in product). This was guarded by `NOT_PRODUCT` before. For this patch, I make this information available in product builds and use it to implement this feature. > > Thanks, > Christian Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21969/files - new: https://git.openjdk.org/jdk/pull/21969/files/e9161d16..e9161d16 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21969&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21969&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21969/head:pull/21969 PR: https://git.openjdk.org/jdk/pull/21969 From chagedorn at openjdk.org Mon Nov 11 06:30:18 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 11 Nov 2024 06:30:18 GMT Subject: RFR: 8343745: Only update Last Value Assertion Predicates in Loop Unrolling [v2] In-Reply-To: References: Message-ID: <_eqJKUiwXpMRDabUbh96ktkMtEaic0l13KEovL4FA40=.8813cbf6-c359-40f7-9c1f-2f2d3acb4a83@github.com> On Mon, 11 Nov 2024 06:26:28 GMT, Christian Hagedorn wrote: >> (Note: This is a dependent PR on https://github.com/openjdk/jdk/pull/21944 which is fully reviewed but I'd prefer to integrate it on Monday and run again some testing over the weekend) >> >> This is a follow-up feature to https://github.com/openjdk/jdk/pull/21944 which updated Loop Unrolling to use a new predicate visitor and enables this patch. >> >> In Loop Unrolling, we only update the stride and not the init value of a loop. Thus, we actually only require to update the Last Value Assertion Predicates because the Init Value Assertion Predicates do not use `OpaqueLoopStride`. So, we also would not be required to kill the old Init Value Initialized Assertion Predicates. This patch implements that improvement. >> >> To make this work, we need to query the associated `AssertionPredicateType` of an Assertion Predicate which is stored in the `If/RangeCheckNode` (I guess it's okay to have this additional node field in product). This was guarded by `NOT_PRODUCT` before. For this patch, I make this information available in product builds and use it to implement this feature. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. Thanks Vladimir for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21969#issuecomment-2467340652 From chagedorn at openjdk.org Mon Nov 11 06:54:32 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 11 Nov 2024 06:54:32 GMT Subject: RFR: 8343745: Only update Last Value Assertion Predicates in Loop Unrolling [v3] In-Reply-To: References: Message-ID: > (Note: This is a dependent PR on https://github.com/openjdk/jdk/pull/21944 which is fully reviewed but I'd prefer to integrate it on Monday and run again some testing over the weekend) > > This is a follow-up feature to https://github.com/openjdk/jdk/pull/21944 which updated Loop Unrolling to use a new predicate visitor and enables this patch. > > In Loop Unrolling, we only update the stride and not the init value of a loop. Thus, we actually only require to update the Last Value Assertion Predicates because the Init Value Assertion Predicates do not use `OpaqueLoopStride`. So, we also would not be required to kill the old Init Value Initialized Assertion Predicates. This patch implements that improvement. > > To make this work, we need to query the associated `AssertionPredicateType` of an Assertion Predicate which is stored in the `If/RangeCheckNode` (I guess it's okay to have this additional node field in product). This was guarded by `NOT_PRODUCT` before. For this patch, I make this information available in product builds and use it to implement this feature. > > Thanks, > Christian Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Merge branch 'master' of https://git.openjdk.org/jdk into JDK-8343745 - 8343745: Only update Last Value Assertion Predicates in Loop Unrolling - Add const - 8342946: Replace predicate walking code in Loop Unrolling with a predicate visitor ------------- Changes: https://git.openjdk.org/jdk/pull/21969/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21969&range=02 Stats: 130 lines in 7 files changed: 47 ins; 13 del; 70 mod Patch: https://git.openjdk.org/jdk/pull/21969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21969/head:pull/21969 PR: https://git.openjdk.org/jdk/pull/21969 From chagedorn at openjdk.org Mon Nov 11 06:57:37 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 11 Nov 2024 06:57:37 GMT Subject: RFR: 8343745: Only update Last Value Assertion Predicates in Loop Unrolling [v4] In-Reply-To: References: Message-ID: > (Note: This is a dependent PR on https://github.com/openjdk/jdk/pull/21944 which is fully reviewed but I'd prefer to integrate it on Monday and run again some testing over the weekend) > > This is a follow-up feature to https://github.com/openjdk/jdk/pull/21944 which updated Loop Unrolling to use a new predicate visitor and enables this patch. > > In Loop Unrolling, we only update the stride and not the init value of a loop. Thus, we actually only require to update the Last Value Assertion Predicates because the Init Value Assertion Predicates do not use `OpaqueLoopStride`. So, we also would not be required to kill the old Init Value Initialized Assertion Predicates. This patch implements that improvement. > > To make this work, we need to query the associated `AssertionPredicateType` of an Assertion Predicate which is stored in the `If/RangeCheckNode` (I guess it's okay to have this additional node field in product). This was guarded by `NOT_PRODUCT` before. For this patch, I make this information available in product builds and use it to implement this feature. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Fix after merge ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21969/files - new: https://git.openjdk.org/jdk/pull/21969/files/fb9dadfd..7279d42e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21969&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21969&range=02-03 Stats: 31 lines in 1 file changed: 0 ins; 31 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21969/head:pull/21969 PR: https://git.openjdk.org/jdk/pull/21969 From rrich at openjdk.org Mon Nov 11 07:18:47 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 11 Nov 2024 07:18:47 GMT Subject: RFR: 8343774: compiler/c2/TestCastX2NotProcessedIGVN.java fails on ppc64(le) & s390x platforms In-Reply-To: References: Message-ID: On Sat, 9 Nov 2024 03:41:56 GMT, Amit Kumar wrote: >>> Thanks for taking care of that. Maybe it would be more robust to only enable the test on x86, aarch64 and riscv64. The whole test doesn't need to excluded actually. Only IR matching on `test2` needs to be disabled. This can be done with: >>> >>> ``` >>> applyIfPlatformOr = {"x64", "true", "aarch64", "true", "riscv64", "true"}) >>> ``` >>> >>> See `TestBoolNodeGVN.java` for instance. >> >> Ok. I've done that. > > @reinrich Sorry for creating mess here. > > Yesterday, this test failed while testing changes for JEP 450 related to compact headers. However, now I checked and head stream testing job shows that it does not fail with `jdk-head`; > > I have verified and It fails only on s390x when I enable UseCompactObjectHeaders: `make test TEST=jtreg:$(find . -name TestCastX2NotProcessedIGVN.java) JTREG="VM_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders"`. > > While this issue could potentially occur with other settings, `UseCompactObjectHeaders` is the only one I have observed causing this failure. Do you suggest disabling this, or is separate debugging required to investigate this behaviour?" It's really up to you @offamitkumar. For PPC we have opened an internal bug (actually it should be mirrored by a JBS-issue) to revise the compilation of `test2`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21975#issuecomment-2467401535 From epeter at openjdk.org Mon Nov 11 07:26:29 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 11 Nov 2024 07:26:29 GMT Subject: RFR: 8343629: More MergeStore benchmark In-Reply-To: References: Message-ID: <6B34f81JucswxU43rqcM1jF1UDoVhYs7ukuClJvYKNw=.6c7cc0a1-fe21-4928-9ee6-26deb1b189eb@github.com> On Wed, 6 Nov 2024 07:22:40 GMT, Emanuel Peter wrote: >> 1. Added the putBytes4 benchmark, which corresponds to StringBuilder appendNull >> 2. Optimized the putChars4/setInt/setLong series of benchmarks to reduce extra overhead and more accurately reflect performance differences. > > You can find an example of how to do that easily here: > https://github.com/openjdk/jdk/pull/19970/files#diff-9072c369f5b541ef9fca3ad8320aa59e88cc72f203c03da58100b1d111ffc324R746-R749 > @eme64 Why is there no noticeable difference in the performance of +/-MergeStores What did you do to find out yourself? Did you use the trace flags to see if there is a difference in what is optimized / the output assembly code? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21659#issuecomment-2467415168 From epeter at openjdk.org Mon Nov 11 07:26:30 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 11 Nov 2024 07:26:30 GMT Subject: RFR: 8343629: More MergeStore benchmark [v2] In-Reply-To: References: Message-ID: On Sat, 9 Nov 2024 02:36:14 GMT, Shaojin Wen wrote: >> 1. Added the putBytes4 benchmark, which corresponds to StringBuilder appendNull >> 2. Optimized the putChars4/setInt/setLong series of benchmarks to reduce extra overhead and more accurately reflect performance differences. > > Shaojin Wen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - from @eme64 add MergeStoresDisabled > - Merge remote-tracking branch 'upstream/master' into merge_store_bench_202410 > - Merge remote-tracking branch 'upstream/master' into merge_store_bench_202410 > - Merge branch 'master' into merge_store_bench_202410 > - add putBytes4 and improved put test/micro/org/openjdk/bench/vm/compiler/MergeStoreBench.java line 1153: > 1151: } > 1152: > 1153: @Fork(value = 1, jvmArgsPrepend = { Suggestion: @Fork(value = 1, jvmArgs = { Can you make this change, and run the benchmarks again? There was a recent JMH build script change, and all usages of `jvmArgsPrepend` in JMH tests were supposed to be changed to `jvmArgs`. I think in your case the flag is actually not applied. Not sure if that is true, but it looks that way to me. https://github.com/openjdk/jdk/pull/21800 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21659#discussion_r1836062193 From duke at openjdk.org Mon Nov 11 07:43:16 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 11 Nov 2024 07:43:16 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v4] In-Reply-To: References: Message-ID: <-1uIsg-ge9MgmoMQFqE7ojuoKr16S4v545Vy71uCs18=.a37c4555-9f94-4aa8-ae59-037f33ff8f05@github.com> On Fri, 8 Nov 2024 17:39:32 GMT, Vladimir Kozlov wrote: >> theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: >> >> Add set_root_as_ctrl > > src/hotspot/share/opto/loopnode.cpp line 3147: > >> 3145: ConINode* zero = igvn->intcon(0); >> 3146: if (iloop != nullptr) { >> 3147: iloop->set_root_as_ctrl(zero); > > Please look on history of this code. This is suspicious - constant nodes should be always attached to Root. @TobiHartmann Pointed out that this method is also called from code outside of loop opts, for example, `PhaseMacroExpand::expand_macro_nodes`. Since there's no PhaseIdealLoop in this case, nullptr is passed instead and we cannot set control as we are not inside a loop opt. Maybe @rwestrel can also take a look as he originally introduced this code in [this PR](https://github.com/openjdk/jdk/pull/7364/files#diff-d49652d43244d52415873c37bf6990269b0d6e2f2111f4f971660470b6bca738R2860). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1836075707 From duke at openjdk.org Mon Nov 11 07:48:00 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 11 Nov 2024 07:48:00 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v5] In-Reply-To: References: Message-ID: > This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: > > > ConINode* node = _igvn.intcon(i); > set_ctrl(node, C->root()); > > > and > > > ConLNode* node = _igvn.longcon(i); > set_ctrl(node, C->root()); > > > Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Improve brace style ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21836/files - new: https://git.openjdk.org/jdk/pull/21836/files/3dc3befd..b472aafe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21836.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21836/head:pull/21836 PR: https://git.openjdk.org/jdk/pull/21836 From rrich at openjdk.org Mon Nov 11 08:12:15 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 11 Nov 2024 08:12:15 GMT Subject: RFR: 8343774: Positiv list ir checks of compiler/c2/TestCastX2NotProcessedIGVN.java [v3] In-Reply-To: References: Message-ID: <05tFL9qXcLev2gBaotCrbqvPVv4Zb5pVN1tPNypCBBs=.db0999d8-64c9-41ac-90a4-019dd7ec4adf@github.com> On Fri, 8 Nov 2024 15:57:37 GMT, Richard Reingruber wrote: >> Vectorization of the loop in `test2` does not work on ppc therefore I want to exclude it there. > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Remove riscv64 I've converted this issue [JDK-8343774](https://bugs.openjdk.org/browse/JDK-8343774) into a subtask. In the subtask the platforms where the ir checks of `test2` succeed are positive listed. The issues of other platforms are tracked in the parent task. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21975#issuecomment-2467486579 From duke at openjdk.org Mon Nov 11 08:30:46 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 11 Nov 2024 08:30:46 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v4] In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 17:41:33 GMT, Vladimir Kozlov wrote: >> theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: >> >> Add set_root_as_ctrl > > src/hotspot/share/opto/loopopts.cpp line 195: > >> 193: set_root_as_ctrl(x); >> 194: continue; >> 195: } > > This looks like "band-aid" - this should be assert. May be investigate in separate RFE. I opened an RFE for this https://bugs.openjdk.org/browse/JDK-8343907 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1836126244 From jbhateja at openjdk.org Mon Nov 11 08:32:18 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 11 Nov 2024 08:32:18 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v3] In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 04:48:00 GMT, Jasmine Karthikeyan wrote: >> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into phase-lowering >> - Remove platform-dependent node definitions, rework PhaseLowering implementation >> - Address some changes from code review >> - Implement PhaseLowering > > Thanks everyone for the discussion. I've pushed a commit that restructures the pass, removing the backend-specific node definition and making the pass extend `PhaseIterGVN` so that nodes can do further idealizations during lowering without complicating the main lowering switch. I also added a shared component to lowering, to facilitate moving transforms that impact multiple backends like `DivMod` to it. Lowering is also now the final phase before final graph reshaping now, since late inlines could also use IGVN. Some more comments: > >> It looks attractive at first, but the downside is subsequent passes may start to require platform-specific code as well (e.g., think of final graph reshaping which operates on Node opcodes). > > This makes sense to me. I agree that the extra complexity required to deal with this change in other parts of the code isn't worth it. The new commit removes this part of the changeset. > >> BTW it's not clear to me now what particular benefits IGVN brings. `DivMod` transformation doesn't use IGVN and after examining `MacroLogicV` code it can be rewritten to avoid IGVN as well. > > The main benefits are being able to reuse node hashing to de-duplicate redundant nodes and being able to use the existing IGVN types that were calculated (which #21244 uses). Some examples where GVN could be useful in final graph reshaping is when reshaping shift nodes and `Op_CmpUL`, where new nodes are created to approximate existing nodes on platforms without support. While I think it is unlikely that any of the created nodes would common with existing nodes except the `ConNode`s, I think it would be nice to reduce the possibility of redundant nodes in the graph before matching. This would include `DivMod` in the cases where the backend doesn't support the `DivMod` node, as multiplication and subtraction is emitted instead. I'm working on refactoring these cases in my example patch. I think it would be nice to make lowering where these platform specific optimizations occur while final graph reshaping focuses on preparing the graph for matching. > >> I'd say that if we want the lowering pass being discussed to be truly scalable, it's better to follow the same pattern. I have some doubts that platform-specific ad-hoc IR tweaks scale will scale well. > > My main concern with the macro-expansion style is that with the proposed transforms unconditional expansion/lowering of nodes isn't always possible. For example, In final graph reshaping for `DivMod` it can be the case ... Hi @jaskarth , I was trying to lower LShiftVB and URShiftVB IR for x86 backend intending to factor out upfront bytevector to shortvector conversion for input and shift vectors through GVN if both these are shared across two operations since x86 ISA does support direct byte vector shifts. To begin with, I simply made the following diff expecting status quo, but getting the following Fatal error at build time, can you kindly check? diff --git a/src/hotspot/cpu/x86/c2_lowering_x86.cpp b/src/hotspot/cpu/x86/c2_lowering_x86.cpp index cf4c014ffda..bc8df186396 100644 --- a/src/hotspot/cpu/x86/c2_lowering_x86.cpp +++ b/src/hotspot/cpu/x86/c2_lowering_x86.cpp @@ -32,6 +32,6 @@ Node* PhaseLowering::lower_node_platform(Node* n) { } bool PhaseLowering::should_lower() { - return false; + return true; } #endif // COMPILER2 ``` ERROR: Build failed for target 'images' in configuration 'linux-x86_64-server-fastdebug' (exit code 2) === Output from failing command(s) repeated here === * For target support_interim-image-jlink__jlink_interim_image_exec: # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/jatinbha/sandboxes/jdk-trunk/jdk/src/hotspot/share/opto/node.hpp:960), pid=1961256, tid=1961293 # assert(is_MachReturn()) failed: invalid node class: Con # # JRE version: OpenJDK Runtime Environment (24.0) (fastdebug build 24-internal-adhoc.root.jdk) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 24-internal-adhoc.root.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x140f939] Matcher::Fixup_Save_On_Entry()+0x279 # # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/jatinbha/sandboxes/jdk-trunk/jdk/make/core.1961256) # # An error report file with more information is saved as: # /home/jatinbha/sandboxes/jdk-trunk/jdk/make/hs_err_pid1961256.log ... (rest of output omitted) * All command lines available in /home/jatinbha/sandboxes/jdk-trunk/jdk/build/linux-x86_64-server-fastdebug/make-support/failure-logs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2467523805 From duke at openjdk.org Mon Nov 11 08:37:26 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 11 Nov 2024 08:37:26 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v6] In-Reply-To: References: Message-ID: > This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: > > > ConINode* node = _igvn.intcon(i); > set_ctrl(node, C->root()); > > > and > > > ConLNode* node = _igvn.longcon(i); > set_ctrl(node, C->root()); > > > Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. theoweidmannoracle has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: - Merge branch 'master' into JDK-8343148 - Improve brace style - Add set_root_as_ctrl - Update src/hotspot/share/opto/loopopts.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/loopopts.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/loopopts.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/loopTransform.cpp Co-authored-by: Emanuel Peter - Add helper methods for zerocon, makecon, and integercon too - 8343148: C2: Refactor uses of "PhaseValues::intcon() + PhaseIdealLoop::set_ctrl()" into separate method ------------- Changes: https://git.openjdk.org/jdk/pull/21836/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=05 Stats: 130 lines in 7 files changed: 44 ins; 42 del; 44 mod Patch: https://git.openjdk.org/jdk/pull/21836.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21836/head:pull/21836 PR: https://git.openjdk.org/jdk/pull/21836 From amitkumar at openjdk.org Mon Nov 11 08:44:51 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 11 Nov 2024 08:44:51 GMT Subject: RFR: 8343774: Positiv list platforms for ir checks of compiler/c2/TestCastX2NotProcessedIGVN.java In-Reply-To: References: Message-ID: On Sat, 9 Nov 2024 03:41:56 GMT, Amit Kumar wrote: >>> Thanks for taking care of that. Maybe it would be more robust to only enable the test on x86, aarch64 and riscv64. The whole test doesn't need to excluded actually. Only IR matching on `test2` needs to be disabled. This can be done with: >>> >>> ``` >>> applyIfPlatformOr = {"x64", "true", "aarch64", "true", "riscv64", "true"}) >>> ``` >>> >>> See `TestBoolNodeGVN.java` for instance. >> >> Ok. I've done that. > > @reinrich Sorry for creating mess here. > > Yesterday, this test failed while testing changes for JEP 450 related to compact headers. However, now I checked and head stream testing job shows that it does not fail with `jdk-head`; > > I have verified and It fails only on s390x when I enable UseCompactObjectHeaders: `make test TEST=jtreg:$(find . -name TestCastX2NotProcessedIGVN.java) JTREG="VM_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders"`. > > While this issue could potentially occur with other settings, `UseCompactObjectHeaders` is the only one I have observed causing this failure. Do you suggest disabling this, or is separate debugging required to investigate this behaviour?" > It's really up to you @offamitkumar. For PPC we have opened an internal bug (actually it should be mirrored by a JBS-issue) to revise the compilation of `test2`. I did same yesterday for s390x as well. I have added it in todo list for the internal tracker. Maybe If possible, you can add s390x as affected architecture for JBS issue: [JDK-8343906](https://bugs.openjdk.org/browse/JDK-8343906). I am fine now with integrating it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21975#issuecomment-2467547961 From amitkumar at openjdk.org Mon Nov 11 08:50:56 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 11 Nov 2024 08:50:56 GMT Subject: RFR: 8343774: Positiv list platforms for ir checks of compiler/c2/TestCastX2NotProcessedIGVN.java [v3] In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 15:57:37 GMT, Richard Reingruber wrote: >> Vectorization of the loop in `test2` does not work on ppc therefore I want to exclude it there. > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Remove riscv64 just a minor title update `Positiv` -> `Positive` ------------- PR Comment: https://git.openjdk.org/jdk/pull/21975#issuecomment-2467556536 From rrich at openjdk.org Mon Nov 11 08:50:56 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 11 Nov 2024 08:50:56 GMT Subject: RFR: 8343774: Positiv list platforms for ir checks of compiler/c2/TestCastX2NotProcessedIGVN.java [v3] In-Reply-To: References: Message-ID: <3f0OD3ntQxIsA5RmuBT_hifohIFreR9H18htd77NLkA=.f5108f38-9b34-4ae3-b66a-207ac4e91d72@github.com> On Mon, 11 Nov 2024 08:44:50 GMT, Amit Kumar wrote: > just a minor title update `Positiv` -> `Positive` Thanks :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21975#issuecomment-2467560664 From rrich at openjdk.org Mon Nov 11 08:58:16 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 11 Nov 2024 08:58:16 GMT Subject: RFR: 8343774: Positiv list platforms for ir checks of compiler/c2/TestCastX2NotProcessedIGVN.java [v3] In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 08:44:50 GMT, Amit Kumar wrote: >> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove riscv64 > > just a minor title update `Positiv` -> `Positive` > > It's really up to you @offamitkumar. For PPC we have opened an internal bug (actually it should be mirrored by a JBS-issue) to revise the compilation of `test2`. > > I did same yesterday for s390x as well. I have added it in todo list for the internal tracker. Maybe If possible, you can add s390x as affected architecture for JBS issue: [JDK-8343906](https://bugs.openjdk.org/browse/JDK-8343906). I've added s390x. Please feel free to add details if needed. > I am fine now with integrating it. Ok, thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21975#issuecomment-2467579388 From amitkumar at openjdk.org Mon Nov 11 08:58:53 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 11 Nov 2024 08:58:53 GMT Subject: RFR: 8343810: [s390x] is_uimm* methods should take unsigned arguments In-Reply-To: References: Message-ID: <1WgN6l7bp21jxLEkNjVEzNIb6M0egdURLq2yyli0xHc=.55f2ea60-5a70-4978-a442-3c5ad724b697@github.com> On Fri, 8 Nov 2024 04:47:23 GMT, Amit Kumar wrote: > trivial patch which just updates the argument datatype of `is_uimm*` methods, from `int64_t` to `uint64_t`. @RealLucy ? PS: Should we backport it as well ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21967#issuecomment-2467577885 From tschatzl at openjdk.org Mon Nov 11 09:11:50 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 11 Nov 2024 09:11:50 GMT Subject: RFR: 8343824: Remove unused InstructionFlags in C1 In-Reply-To: References: <_4kRSWzc_WTe7oVsSWZZh0n6s3aRBF8sHS2rxVD50jA=.b63db74f-da56-4ecf-aef5-ef8ce8a5a198@github.com> Message-ID: On Fri, 8 Nov 2024 17:46:49 GMT, Vladimir Kozlov wrote: >> Hi all, >> >> please review this change that removes unused/unreferenced flags from the `Instruction::InstructionFlag` enum for the C1 compiler. >> >> Testing: gha, tier1-3 >> >> Thanks, >> Thomas > > @tschatzl do you know history of these flags and why they are not used? Thanks @vnkozlov @dean-long for your reviews. Thanks for the additional archeology information. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21973#issuecomment-2467603389 From tschatzl at openjdk.org Mon Nov 11 09:11:51 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 11 Nov 2024 09:11:51 GMT Subject: Integrated: 8343824: Remove unused InstructionFlags in C1 In-Reply-To: <_4kRSWzc_WTe7oVsSWZZh0n6s3aRBF8sHS2rxVD50jA=.b63db74f-da56-4ecf-aef5-ef8ce8a5a198@github.com> References: <_4kRSWzc_WTe7oVsSWZZh0n6s3aRBF8sHS2rxVD50jA=.b63db74f-da56-4ecf-aef5-ef8ce8a5a198@github.com> Message-ID: On Fri, 8 Nov 2024 11:09:27 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that removes unused/unreferenced flags from the `Instruction::InstructionFlag` enum for the C1 compiler. > > Testing: gha, tier1-3 > > Thanks, > Thomas This pull request has now been integrated. Changeset: ae6bb3cd Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/ae6bb3cd29bd4cdbb2df320fbfe0dabb7c0647d7 Stats: 10 lines in 1 file changed: 0 ins; 10 del; 0 mod 8343824: Remove unused InstructionFlags in C1 Reviewed-by: kvn, dlong ------------- PR: https://git.openjdk.org/jdk/pull/21973 From lucy at openjdk.org Mon Nov 11 09:31:14 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Mon, 11 Nov 2024 09:31:14 GMT Subject: RFR: 8343810: [s390x] is_uimm* methods should take unsigned arguments In-Reply-To: References: Message-ID: <3ZMlCyS_CutzkhsO4vdfx4GRpWNqUZmapdlIvkJEaAM=.a29c06ea-37c2-42fe-995d-022e82351107@github.com> On Fri, 8 Nov 2024 04:47:23 GMT, Amit Kumar wrote: > trivial patch which just updates the argument datatype of `is_uimm*` methods, from `int64_t` to `uint64_t`. LGTM. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21967#pullrequestreview-2426510952 From lucy at openjdk.org Mon Nov 11 09:31:14 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Mon, 11 Nov 2024 09:31:14 GMT Subject: RFR: 8343810: [s390x] is_uimm* methods should take unsigned arguments In-Reply-To: <1WgN6l7bp21jxLEkNjVEzNIb6M0egdURLq2yyli0xHc=.55f2ea60-5a70-4978-a442-3c5ad724b697@github.com> References: <1WgN6l7bp21jxLEkNjVEzNIb6M0egdURLq2yyli0xHc=.55f2ea60-5a70-4978-a442-3c5ad724b697@github.com> Message-ID: <8Mv6cUvdNcSdihLLYdH7aHW47mwFMyqP-voEbKGQ1Ro=.d569dc4e-acd9-4ede-bb2d-10abf6500de4@github.com> On Mon, 11 Nov 2024 08:55:02 GMT, Amit Kumar wrote: > PS: Should we backport it as well ? I'm not so much in favor of backporting all that scanner noise. But I may be alone with my opinion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21967#issuecomment-2467647457 From amitkumar at openjdk.org Mon Nov 11 09:35:53 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 11 Nov 2024 09:35:53 GMT Subject: RFR: 8343810: [s390x] is_uimm* methods should take unsigned arguments In-Reply-To: <8Mv6cUvdNcSdihLLYdH7aHW47mwFMyqP-voEbKGQ1Ro=.d569dc4e-acd9-4ede-bb2d-10abf6500de4@github.com> References: <1WgN6l7bp21jxLEkNjVEzNIb6M0egdURLq2yyli0xHc=.55f2ea60-5a70-4978-a442-3c5ad724b697@github.com> <8Mv6cUvdNcSdihLLYdH7aHW47mwFMyqP-voEbKGQ1Ro=.d569dc4e-acd9-4ede-bb2d-10abf6500de4@github.com> Message-ID: On Mon, 11 Nov 2024 09:27:50 GMT, Lutz Schmidt wrote: > I'm not so much in favor of backporting all that scanner noise. But I may be alone with my opinion. Sure, Let's skip it then. Thanks for the approval. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21967#issuecomment-2467655641 From amitkumar at openjdk.org Mon Nov 11 09:35:53 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 11 Nov 2024 09:35:53 GMT Subject: Integrated: 8343810: [s390x] is_uimm* methods should take unsigned arguments In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 04:47:23 GMT, Amit Kumar wrote: > trivial patch which just updates the argument datatype of `is_uimm*` methods, from `int64_t` to `uint64_t`. This pull request has now been integrated. Changeset: a93bd9df Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/a93bd9dfdd7e340b10c24a15fb70a3801bfb373d Stats: 7 lines in 1 file changed: 0 ins; 1 del; 6 mod 8343810: [s390x] is_uimm* methods should take unsigned arguments Reviewed-by: lucy ------------- PR: https://git.openjdk.org/jdk/pull/21967 From rcastanedalo at openjdk.org Mon Nov 11 10:08:36 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 11 Nov 2024 10:08:36 GMT Subject: RFR: 8343067: C2: revisit constant-offset AddP chains after successful input idealizations [v2] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 20:53:03 GMT, Vladimir Kozlov wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Hoist changed offset input check > > Good. Thanks for reviewing @vnkozlov and @iwanowww! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21898#issuecomment-2467732067 From rcastanedalo at openjdk.org Mon Nov 11 10:08:38 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 11 Nov 2024 10:08:38 GMT Subject: Integrated: 8343067: C2: revisit constant-offset AddP chains after successful input idealizations In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 10:02:16 GMT, Roberto Casta?eda Lozano wrote: > This changeset re-adds a constant-offset AddP node (`u`) to C2's IGVN worklist when its address is given by another AddP node (`use`) whose offset has changed. This makes it possible for `AddPNode::Ideal` to flatten the address computation in cases where the offset of the latter (`use->in(AddPNode::Offset)`) is found to be constant during IGVN: > > ![idealization](https://github.com/user-attachments/assets/6b632642-c037-457f-bd19-6b30f24e6ac6) > > The end result is the generation of fewer explicit address computation instructions. > > #### Testing > > ##### Functionality > > - tier1-5 (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > ##### Performance > > - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. This pull request has now been integrated. Changeset: ec13364c Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/ec13364cdab5a52f704bc5d1575f3da17380b4f2 Stats: 73 lines in 3 files changed: 70 ins; 0 del; 3 mod 8343067: C2: revisit constant-offset AddP chains after successful input idealizations Reviewed-by: kvn, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/21898 From duke at openjdk.org Mon Nov 11 10:14:50 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 11 Nov 2024 10:14:50 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v7] In-Reply-To: References: Message-ID: > This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: > > > ConINode* node = _igvn.intcon(i); > set_ctrl(node, C->root()); > > > and > > > ConLNode* node = _igvn.longcon(i); > set_ctrl(node, C->root()); > > > Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Cover another case ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21836/files - new: https://git.openjdk.org/jdk/pull/21836/files/aaa7cf20..8c51ec99 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=05-06 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21836.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21836/head:pull/21836 PR: https://git.openjdk.org/jdk/pull/21836 From tholenstein at openjdk.org Mon Nov 11 11:14:24 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 11 Nov 2024 11:14:24 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v11] In-Reply-To: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: > color > > pick > > nodes > > Adds new option to IGV to color selected nodes: > 1) select some nodes > 2) `Ctrl + C` or `View` -> `Color action` > 3) pick a color and apply Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: simplify ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21925/files - new: https://git.openjdk.org/jdk/pull/21925/files/46024d07..ace2ebfa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=09-10 Stats: 51 lines in 1 file changed: 15 ins; 30 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/21925.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21925/head:pull/21925 PR: https://git.openjdk.org/jdk/pull/21925 From chagedorn at openjdk.org Mon Nov 11 11:56:17 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 11 Nov 2024 11:56:17 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v7] In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 10:14:50 GMT, theoweidmannoracle wrote: >> This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: >> >> >> ConINode* node = _igvn.intcon(i); >> set_ctrl(node, C->root()); >> >> >> and >> >> >> ConLNode* node = _igvn.longcon(i); >> set_ctrl(node, C->root()); >> >> >> Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Cover another case src/hotspot/share/opto/loopTransform.cpp line 2054: > 2052: Node *newcle = old_new[loop_end->_idx]; > 2053: _igvn.hash_delete(newcle); > 2054: Node *one = intcon(1); While at it, you can also fix the wrong `*` placement (should be at type): Suggestion: Node* one = intcon(1); src/hotspot/share/opto/loopTransform.cpp line 2434: > 2432: } > 2433: if (p_offset != nullptr) { > 2434: Node *zero = zerocon(bt); Suggestion: Node* zero = zerocon(bt); src/hotspot/share/opto/loopTransform.cpp line 2485: > 2483: if (p_offset != nullptr) { > 2484: if (which == 1) { // must negate the extracted offset > 2485: Node *zero = integercon(0, exp_bt); Suggestion: Node* zero = integercon(0, exp_bt); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1836340828 PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1836343303 PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1836344082 From chagedorn at openjdk.org Mon Nov 11 11:56:17 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 11 Nov 2024 11:56:17 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v4] In-Reply-To: References: Message-ID: <7-rX03iJvNPTk_sjMgAEI4ki96PwMO3jt1YTDuddgkE=.e98d59ff-d768-407a-abb2-ddc2416a3b06@github.com> On Mon, 11 Nov 2024 08:28:02 GMT, theoweidmannoracle wrote: >> src/hotspot/share/opto/loopopts.cpp line 195: >> >>> 193: set_root_as_ctrl(x); >>> 194: continue; >>> 195: } >> >> This looks like "band-aid" - this should be assert. May be investigate in separate RFE. > > I opened an RFE for this https://bugs.openjdk.org/browse/JDK-8343907 If you modify the following code above to use your new `makecon()` (could be done either way), could this then be turned into an assert? By looking at the code, it suggests that we only miss to set ctrl in the `singleton` case which would then be covered. https://github.com/openjdk/jdk/blob/5ca6698ba418e82ff93471fbb495759850f26f63/src/hotspot/share/opto/loopopts.cpp#L123-L125 You could also only change `makecon()` above and revisit this code later again to remove the `set_root_as_ctrl()` and add an assert. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1836547506 From swen at openjdk.org Mon Nov 11 12:42:24 2024 From: swen at openjdk.org (Shaojin Wen) Date: Mon, 11 Nov 2024 12:42:24 GMT Subject: RFR: 8343925: Test HugeToString.java crashes at java.util.BitSet.toString()Ljava/lang/String Message-ID: 8343925 Feedback PR #21593 test/jdk/java/util/BitSet/HugeToString.java crash, so submit this PR to roll back ------------- Commit messages: - Revert "8342650: Move getChars to DecimalDigits" Changes: https://git.openjdk.org/jdk/pull/22012/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22012&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343925 Stats: 757 lines in 12 files changed: 352 ins; 381 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/22012.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22012/head:pull/22012 PR: https://git.openjdk.org/jdk/pull/22012 From tholenstein at openjdk.org Mon Nov 11 12:51:12 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 11 Nov 2024 12:51:12 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v12] In-Reply-To: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: > color > > pick > > nodes > > Adds new option to IGV to color selected nodes: > 1) select some nodes > 2) `Ctrl + C` or `View` -> `Color action` > 3) pick a color and apply Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: make it work in Linux and MacOS ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21925/files - new: https://git.openjdk.org/jdk/pull/21925/files/ace2ebfa..99e2ed7a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=10-11 Stats: 36 lines in 1 file changed: 4 ins; 18 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/21925.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21925/head:pull/21925 PR: https://git.openjdk.org/jdk/pull/21925 From chagedorn at openjdk.org Mon Nov 11 12:53:55 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 11 Nov 2024 12:53:55 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v12] In-Reply-To: References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: On Mon, 11 Nov 2024 12:51:12 GMT, Tobias Holenstein wrote: >> color >> >> pick >> >> nodes >> >> Adds new option to IGV to color selected nodes: >> 1) select some nodes >> 2) `Ctrl + C` or `View` -> `Color action` >> 3) pick a color and apply > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > make it work in Linux and MacOS Now, after many tries, it seems to work! :-) Thanks for investigating further. Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21925#pullrequestreview-2427218879 From rcastanedalo at openjdk.org Mon Nov 11 13:11:44 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 11 Nov 2024 13:11:44 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v12] In-Reply-To: References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: <7ZgYZpPUL1d2nabYrCdMkZ_m1L1i71xtDvBb-n2g1M8=.64d14226-756e-49e8-a5d8-5b5cc0d35247@github.com> On Mon, 11 Nov 2024 12:51:12 GMT, Tobias Holenstein wrote: >> color >> >> pick >> >> nodes >> >> Adds new option to IGV to color selected nodes: >> 1) select some nodes >> 2) `Ctrl + C` or `View` -> `Color action` >> 3) pick a color and apply > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > make it work in Linux and MacOS Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21925#pullrequestreview-2427253534 From tholenstein at openjdk.org Mon Nov 11 13:28:22 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 11 Nov 2024 13:28:22 GMT Subject: Integrated: 8343535: IGV: Colorize nodes on demand In-Reply-To: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: On Wed, 6 Nov 2024 12:19:47 GMT, Tobias Holenstein wrote: > color > > pick > > nodes > > Adds new option to IGV to color selected nodes: > 1) select some nodes > 2) `Ctrl + C` or `View` -> `Color action` > 3) pick a color and apply This pull request has now been integrated. Changeset: f3ba7676 Author: Tobias Holenstein URL: https://git.openjdk.org/jdk/commit/f3ba7676043756f7cf95d5215e18bd65e9f167e6 Stats: 231 lines in 7 files changed: 209 ins; 15 del; 7 mod 8343535: IGV: Colorize nodes on demand Co-authored-by: Roberto Casta?eda Lozano Reviewed-by: chagedorn, rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/21925 From tholenstein at openjdk.org Mon Nov 11 13:28:21 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 11 Nov 2024 13:28:21 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v2] In-Reply-To: References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: On Fri, 8 Nov 2024 09:02:53 GMT, Roberto Casta?eda Lozano wrote: >> Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/DiagramScene.java >> >> Co-authored-by: Andrey Turbanov > > In my opinion, the IGV toolbar is already pretty crowded (this hurts most when opening two graphs side-by-side) and I would prefer not adding the color icon there. On the other hand, we could add the action to the pop-up menu that's opened when right-clicking into a node or set of nodes. Here's my suggestion: > > > diff --git a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java > index e68abd3297e..c4f2ac670e7 100644 > --- a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java > +++ b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java > @@ -100,6 +100,7 @@ public EditorTopComponent(DiagramViewModel diagramViewModel) { > }; > > Action[] actionsWithSelection = new Action[]{ > + ColorAction.get(ColorAction.class), > ExtractAction.get(ExtractAction.class), > HideAction.get(HideAction.class), > null, > @@ -168,8 +169,6 @@ public void mouseMoved(MouseEvent e) {} > toolBar.add(ReduceDiffAction.get(ReduceDiffAction.class)); > toolBar.add(ExpandDiffAction.get(ExpandDiffAction.class)); > toolBar.addSeparator(); > - toolBar.add(ColorAction.get(ColorAction.class)); > - toolBar.addSeparator(); > toolBar.add(ExtractAction.get(ExtractAction.class)); > toolBar.add(HideAction.get(HideAction.class)); > toolBar.add(ShowAllAction.get(ShowAllAction.class)); > diff --git a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ColorAction.java b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ColorAction.java > index a51934a4322..92921c81512 100644 > --- a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ColorAction.java > +++ b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ColorAction.java > @@ -43,7 +43,7 @@ > @ActionReference(path = "Shortcuts", name = "D-C") > }) > @Messages({ > - "CTL_ColorAction=Color action", > + "CTL_ColorAction=Color", > "HINT_ColorAction=Color current set of selected nodes" > }) > public final class ColorAction extends ModelAwareAction { > diff --git a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ExtractAction.java b/src/utils/I... thanks for the reviews @robcasloz and @chhagedorn ------------- PR Comment: https://git.openjdk.org/jdk/pull/21925#issuecomment-2468173021 From swen at openjdk.org Mon Nov 11 13:47:13 2024 From: swen at openjdk.org (Shaojin Wen) Date: Mon, 11 Nov 2024 13:47:13 GMT Subject: RFR: 8343925: [BACKOUT] JDK-8342650 Move getChars to DecimalDigits In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 12:34:44 GMT, Shaojin Wen wrote: > 8343925 Feedback PR #21593 test/jdk/java/util/BitSet/HugeToString.java crash, > > I can't reproduce the problem on a MacBook M1 Max, but I agree that more testing is needed, so let's roll it back first. It has been verified that it is caused by unsafe offset overflow. The problem has been reproduced and fixed. I submitted PR #22014. Would you consider fixing it this way? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22012#issuecomment-2468215673 From alanb at openjdk.org Mon Nov 11 13:55:11 2024 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 11 Nov 2024 13:55:11 GMT Subject: RFR: 8343925: [BACKOUT] JDK-8342650 Move getChars to DecimalDigits In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 12:34:44 GMT, Shaojin Wen wrote: > 8343925 Feedback PR #21593 test/jdk/java/util/BitSet/HugeToString.java crash, > > I can't reproduce the problem on a MacBook M1 Max, but I agree that more testing is needed, so let's roll it back first. Changes in this area need to be very carefully reviewed and tested. I think continue with the current plan to blackout the original change and seeing wider review and testing for the REDO. Chen is testing the blackout now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22012#issuecomment-2468230634 From jpai at openjdk.org Mon Nov 11 14:10:05 2024 From: jpai at openjdk.org (Jaikiran Pai) Date: Mon, 11 Nov 2024 14:10:05 GMT Subject: RFR: 8343925: [BACKOUT] JDK-8342650 Move getChars to DecimalDigits In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 12:34:44 GMT, Shaojin Wen wrote: > 8343925 Feedback PR #21593 test/jdk/java/util/BitSet/HugeToString.java crash, > > I can't reproduce the problem on a MacBook M1 Max, but I agree that more testing is needed, so let's roll it back first. I have verified that this backout matches a `git revert` of the commit that introduced the change in https://bugs.openjdk.org/browse/JDK-8342650. So on that front, this backout looks OK to me. Alan has noted that Chen is running some tests with this backout. So please wait for that review, before integrating. ------------- Marked as reviewed by jpai (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22012#pullrequestreview-2427380530 From alanb at openjdk.org Mon Nov 11 14:34:26 2024 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 11 Nov 2024 14:34:26 GMT Subject: RFR: 8343925: [BACKOUT] JDK-8342650 Move getChars to DecimalDigits In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 12:34:44 GMT, Shaojin Wen wrote: > 8343925 Feedback PR #21593 test/jdk/java/util/BitSet/HugeToString.java crash, > > I can't reproduce the problem on a MacBook M1 Max, but I agree that more testing is needed, so let's roll it back first. Thanks for the BACKOUT, looks right. ------------- Marked as reviewed by alanb (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22012#pullrequestreview-2427435024 From liach at openjdk.org Mon Nov 11 14:56:27 2024 From: liach at openjdk.org (Chen Liang) Date: Mon, 11 Nov 2024 14:56:27 GMT Subject: RFR: 8343925: [BACKOUT] JDK-8342650 Move getChars to DecimalDigits In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 12:34:44 GMT, Shaojin Wen wrote: > 8343925 Feedback PR #21593 test/jdk/java/util/BitSet/HugeToString.java crash, > > I can't reproduce the problem on a MacBook M1 Max, but I agree that more testing is needed, so let's roll it back first. CI results look good. ------------- Marked as reviewed by liach (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22012#pullrequestreview-2427501471 From swen at openjdk.org Mon Nov 11 15:17:21 2024 From: swen at openjdk.org (Shaojin Wen) Date: Mon, 11 Nov 2024 15:17:21 GMT Subject: Integrated: 8343925: [BACKOUT] JDK-8342650 Move getChars to DecimalDigits In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 12:34:44 GMT, Shaojin Wen wrote: > 8343925 Feedback PR #21593 test/jdk/java/util/BitSet/HugeToString.java crash, > > I can't reproduce the problem on a MacBook M1 Max, but I agree that more testing is needed, so let's roll it back first. This pull request has now been integrated. Changeset: b0a371b0 Author: Shaojin Wen URL: https://git.openjdk.org/jdk/commit/b0a371b0850b8f467ed985ef39a6fce476b62acf Stats: 757 lines in 12 files changed: 352 ins; 381 del; 24 mod 8343925: [BACKOUT] JDK-8342650 Move getChars to DecimalDigits Reviewed-by: jpai, alanb, liach ------------- PR: https://git.openjdk.org/jdk/pull/22012 From rrich at openjdk.org Mon Nov 11 16:38:22 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 11 Nov 2024 16:38:22 GMT Subject: Integrated: 8343774: Positive list platforms for ir checks of compiler/c2/TestCastX2NotProcessedIGVN.java In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 11:44:21 GMT, Richard Reingruber wrote: > Vectorization of the loop in `test2` does not work on ppc therefore I want to exclude it there. This pull request has now been integrated. Changeset: 889f9062 Author: Richard Reingruber URL: https://git.openjdk.org/jdk/commit/889f906235e99b7207f2e30e1f6f5771188f5a56 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8343774: Positive list platforms for ir checks of compiler/c2/TestCastX2NotProcessedIGVN.java Reviewed-by: fyang, amitkumar, roland ------------- PR: https://git.openjdk.org/jdk/pull/21975 From mli at openjdk.org Mon Nov 11 21:36:52 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 11 Nov 2024 21:36:52 GMT Subject: RFR: 8343555: RISC-V: make some verified (on hardware) extension options diagnostic [v2] In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 18:42:04 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this simple patch? >> Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. >> Thanks > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > turn more verified extensions as DIAGNOSTIC Thanks for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21885#issuecomment-2469056625 From mli at openjdk.org Mon Nov 11 21:36:53 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 11 Nov 2024 21:36:53 GMT Subject: Integrated: 8343555: RISC-V: make some verified (on hardware) extension options diagnostic In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 18:33:00 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. > Thanks This pull request has now been integrated. Changeset: cbf4dd58 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/cbf4dd588bf371e13e81204b1585d34bfadddb42 Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod 8343555: RISC-V: make some verified (on hardware) extension options diagnostic Reviewed-by: fyang, rehn ------------- PR: https://git.openjdk.org/jdk/pull/21885 From swen at openjdk.org Tue Nov 12 01:30:29 2024 From: swen at openjdk.org (Shaojin Wen) Date: Tue, 12 Nov 2024 01:30:29 GMT Subject: RFR: 8342650: Move getChars to DecimalDigits Message-ID: This PR is a resubmission after PR #21593 was rolled back, and the unsafe offset overflow issue has been fixed. Move getChars methods of StringLatin1 and StringUTF16 to DecimalDigits to reduce duplication HexDigits and OctalDigits also include getCharsLatin1 and getCharsUTF16 Putting these two methods into DecimalDigits can avoid the need to expose them in JavaLangAccess Eliminate duplicate code in BigDecimal This PR will improve the performance of Integer/Long.toString and StringBuilder.append(int/long) scenarios. This is because Unsafe.putByte is used to eliminate array bounds checks, and of course this elimination is safe. In previous versions, in Integer/Long.toString and StringBuilder.append(int/long) scenarios, -COMPACT_STRING performed better than +COMPACT_STRING. This is because StringUTF16.getChars uses StringUTF16.putChar, which is similar to Unsafe.putChar, and there is no bounds check. ------------- Commit messages: - fix unsafe address overflow - add benchmark - remove comments, from @liach - Merge remote-tracking branch 'upstream/master' into int_get_chars_dedup_202410 - fix Helper - fix Helper - fix Helper - unsafe putByte - remove digitPair - fix import - ... and 4 more: https://git.openjdk.org/jdk/compare/5890d943...cd9ba309 Changes: https://git.openjdk.org/jdk/pull/22023/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22023&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342650 Stats: 757 lines in 12 files changed: 381 ins; 352 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/22023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22023/head:pull/22023 PR: https://git.openjdk.org/jdk/pull/22023 From dholmes at openjdk.org Tue Nov 12 01:51:15 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 12 Nov 2024 01:51:15 GMT Subject: RFR: 8342650: Move getChars to DecimalDigits In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 01:25:16 GMT, Shaojin Wen wrote: > This PR is a resubmission after PR #21593 was rolled back, and the unsafe offset overflow issue has been fixed. > > Move getChars methods of StringLatin1 and StringUTF16 to DecimalDigits to reduce duplication > > HexDigits and OctalDigits also include getCharsLatin1 and getCharsUTF16 > > Putting these two methods into DecimalDigits can avoid the need to expose them in JavaLangAccess > Eliminate duplicate code in BigDecimal > > This PR will improve the performance of Integer/Long.toString and StringBuilder.append(int/long) scenarios. This is because Unsafe.putByte is used to eliminate array bounds checks, and of course this elimination is safe. > > In previous versions, in Integer/Long.toString and StringBuilder.append(int/long) scenarios, -COMPACT_STRING performed better than +COMPACT_STRING. This is because StringUTF16.getChars uses StringUTF16.putChar, which is similar to Unsafe.putChar, and there is no bounds check. @wenshao you need a new JBS issue to complete this work under. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22023#issuecomment-2469426151 From fyang at openjdk.org Tue Nov 12 03:42:08 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 12 Nov 2024 03:42:08 GMT Subject: RFR: 8343964: RISC-V: Improve PrintOptoAssembly output for loadNKlassCompactHeaders node Message-ID: Hi, please review this small change. Currently, we print a simple `lwu` for this node, which is not accurate becasue we do a `ld` and logic shift right the loaded 64-bit value for this node. This simply changed it into `load_narrow_klass_compact` like other CPU platforms. After this change, we have: 070 B2: # out( B8 B3 ) <- in( B1 ) Freq: 0.9 070 + load_narrow_klass_compact R28, [R12, #4] # compressed class ptr, #@loadNKlassCompactHeaders (Tagging: @Hamlin-Li) ------------- Commit messages: - 8343964: RISC-V: Improve PrintOptoAssembly output for loadNKlassCompactHeaders node Changes: https://git.openjdk.org/jdk/pull/22025/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22025&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343964 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/22025.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22025/head:pull/22025 PR: https://git.openjdk.org/jdk/pull/22025 From fyang at openjdk.org Tue Nov 12 06:44:26 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 12 Nov 2024 06:44:26 GMT Subject: RFR: 8343805: RISC-V: JVM crashes on startup when disabling compressed instructions [v2] In-Reply-To: <0MGF-2vv_88fCBAwbnHUgug-nizGHUDJNCZHKOwGjSU=.d9b3fa18-7f5f-4231-ab5f-26504becf025@github.com> References: <0MGF-2vv_88fCBAwbnHUgug-nizGHUDJNCZHKOwGjSU=.d9b3fa18-7f5f-4231-ab5f-26504becf025@github.com> Message-ID: > Hello, please review this trivial change. > > The reason of the crash is that we will use more space for compiler stubs during stubRoutines generation when compressed instructions is disabled expecially when the CPU is not equipped with the RISC-V B extension. So this simply increases the reserved size of compiler stubs for this CPU platform. After this change, we have (without B extension): > > > $ java -Xlog:stubs -XX:-UseRVC -version > [0.010s][info][stubs] StubRoutines (initial stubs) [0x0000003f8f3cf340, 0x0000003f8f3d1cd0] used: 604, free: 10036 > [0.117s][info][stubs] StubRoutines (continuation stubs) [0x0000003f8f3d25c0, 0x0000003f8f3d3010] used: 628, free: 2012 > [0.153s][info][stubs] StubRoutines (final stubs) [0x0000003f8f4025c0, 0x0000003f8f409d70] used: 9380, free: 21260 > [0.199s][info][stubs] StubRoutines (compiler stubs) [0x0000003f8f4d7c40, 0x0000003f8f4e3180] used: 38924, free: 7476 Fei Yang has updated the pull request incrementally with one additional commit since the last revision: Add more space for hardware platforms with vector extension ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21966/files - new: https://git.openjdk.org/jdk/pull/21966/files/be8bff6d..b24ce03d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21966&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21966&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21966.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21966/head:pull/21966 PR: https://git.openjdk.org/jdk/pull/21966 From dlunden at openjdk.org Tue Nov 12 07:03:04 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 12 Nov 2024 07:03:04 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 14:41:46 GMT, Roberto Casta?eda Lozano wrote: > This changeset dumps C2's low-level intermediate representation at the following intermediate register allocation points: > - Initial liveness: after initial liveness information is computed. > - Aggressive coalescing: after aggressively coalescing live ranges and destructing SSA. > - Initial spilling: after initial round of spilling derived from physical interference graph construction. > - Conservative coalescing: after each round of conservative (colorability-preserving) coalescing (if `OptoCoalesce` is enabled). > - Iterative spilling: after each round of spilling. > - After iterative spilling: after the main register allocation loop. > - Post-allocation copy removal: after peephole copy removal. > - Merge multiple definitions: after local merging of equivalent nodes related by the same live range. > - Fix up spills: convert load-store spills into memory operand accesses ("CISC spilling") if allowed by the target platform and `UseCISCSpill` is enabled. > > The new dumps have already proved to be useful in the investigation of [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295). > > #### Testing > > - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64). > - Tested automatically that dumping, scheduling, and viewing hundreds of the new graphs does not trigger any failure on HotSpot or IGV (by instrumenting IGV to schedule and view graphs eagerly and running `java -Xbatch -XX:PrintIdealGraphLevel=4`). Looks great, and I can confirm the new phases are very useful! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22017#issuecomment-2468592185 From rcastanedalo at openjdk.org Tue Nov 12 07:03:04 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 12 Nov 2024 07:03:04 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps Message-ID: This changeset dumps C2's low-level intermediate representation at the following intermediate register allocation points: - Initial liveness: after initial liveness information is computed. - Aggressive coalescing: after aggressively coalescing live ranges and destructing SSA. - Initial spilling: after initial round of spilling derived from physical interference graph construction. - Conservative coalescing: after each round of conservative (colorability-preserving) coalescing (if `OptoCoalesce` is enabled). - Iterative spilling: after each round of spilling. - After iterative spilling: after the main register allocation loop. - Post-allocation copy removal: after peephole copy removal. - Merge multiple definitions: after local merging of equivalent nodes related by the same live range. - Fix up spills: convert load-store spills into memory operand accesses ("CISC spilling") if allowed by the target platform and `UseCISCSpill` is enabled. The new dumps have already proved to be useful in the investigation of [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295). #### Testing - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64). - Tested automatically that dumping, scheduling, and viewing hundreds of the new graphs does not trigger any failure on HotSpot or IGV (by instrumenting IGV to schedule and view graphs eagerly and running `java -Xbatch -XX:PrintIdealGraphLevel=4`). ------------- Commit messages: - Fix IR framework definitions - Dump graph at intermediate register allocation points Changes: https://git.openjdk.org/jdk/pull/22017/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22017&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343941 Stats: 46 lines in 3 files changed: 46 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22017.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22017/head:pull/22017 PR: https://git.openjdk.org/jdk/pull/22017 From rcastanedalo at openjdk.org Tue Nov 12 07:03:04 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 12 Nov 2024 07:03:04 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 16:37:04 GMT, Daniel Lund?n wrote: > Looks great, and I can confirm the new phases are very useful! Thanks Daniel, feel free to review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22017#issuecomment-2469743531 From dlunden at openjdk.org Tue Nov 12 08:17:27 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 12 Nov 2024 08:17:27 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps In-Reply-To: References: Message-ID: <7erkwjsUNJJR5xrK2_DapO59QrIUTzn_wrqm-8Jo4EQ=.ebbbf82d-722f-459c-bbbc-871a4151e7f8@github.com> On Mon, 11 Nov 2024 14:41:46 GMT, Roberto Casta?eda Lozano wrote: > This changeset dumps C2's low-level intermediate representation at the following intermediate register allocation points: > - Initial liveness: after initial liveness information is computed. > - Aggressive coalescing: after aggressively coalescing live ranges and destructing SSA. > - Initial spilling: after initial round of spilling derived from physical interference graph construction. > - Conservative coalescing: after each round of conservative (colorability-preserving) coalescing (if `OptoCoalesce` is enabled). > - Iterative spilling: after each round of spilling. > - After iterative spilling: after the main register allocation loop. > - Post-allocation copy removal: after peephole copy removal. > - Merge multiple definitions: after local merging of equivalent nodes related by the same live range. > - Fix up spills: convert load-store spills into memory operand accesses ("CISC spilling") if allowed by the target platform and `UseCISCSpill` is enabled. > > The new dumps have already proved to be useful in the investigation of [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295). > > #### Testing > > - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64). > - Tested automatically that dumping, scheduling, and viewing hundreds of the new graphs does not trigger any failure on HotSpot or IGV (by instrumenting IGV to schedule and view graphs eagerly and running `java -Xbatch -XX:PrintIdealGraphLevel=4`). Marked as reviewed by dlunden (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22017#pullrequestreview-2428842886 From chagedorn at openjdk.org Tue Nov 12 08:29:54 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 12 Nov 2024 08:29:54 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 14:41:46 GMT, Roberto Casta?eda Lozano wrote: > This changeset dumps C2's low-level intermediate representation at the following intermediate register allocation points: > - Initial liveness: after initial liveness information is computed. > - Aggressive coalescing: after aggressively coalescing live ranges and destructing SSA. > - Initial spilling: after initial round of spilling derived from physical interference graph construction. > - Conservative coalescing: after each round of conservative (colorability-preserving) coalescing (if `OptoCoalesce` is enabled). > - Iterative spilling: after each round of spilling. > - After iterative spilling: after the main register allocation loop. > - Post-allocation copy removal: after peephole copy removal. > - Merge multiple definitions: after local merging of equivalent nodes related by the same live range. > - Fix up spills: convert load-store spills into memory operand accesses ("CISC spilling") if allowed by the target platform and `UseCISCSpill` is enabled. > > The new dumps have already proved to be useful in the investigation of [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295). > > #### Testing > > - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64). > - Tested automatically that dumping, scheduling, and viewing hundreds of the new graphs does not trigger any failure on HotSpot or IGV (by instrumenting IGV to schedule and view graphs eagerly and running `java -Xbatch -XX:PrintIdealGraphLevel=4`). Looks good! Just an idea, since you've provided a nice description for each phase in the PR description, should we add these in phasetype.hpp at the phases? src/hotspot/share/opto/phasetype.hpp line 104: > 102: flags(POST_ALLOCATION_COPY_REMOVAL, "Post-allocation copy removal") \ > 103: flags(MERGE_MULTIDEFS, "Merge multiple definitions") \ > 104: flags(FIXUP_SPILLS, "Fix up spills") \ Should we split at the word boundary? Suggestion: flags(FIX_UP_SPILLS, "Fix up spills") \ ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22017#pullrequestreview-2428867541 PR Review Comment: https://git.openjdk.org/jdk/pull/22017#discussion_r1837667475 From thartmann at openjdk.org Tue Nov 12 09:28:42 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 12 Nov 2024 09:28:42 GMT Subject: RFR: 8343745: Only update Last Value Assertion Predicates in Loop Unrolling [v4] In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 06:57:37 GMT, Christian Hagedorn wrote: >> (Note: This is a dependent PR on https://github.com/openjdk/jdk/pull/21944 which is fully reviewed but I'd prefer to integrate it on Monday and run again some testing over the weekend) >> >> This is a follow-up feature to https://github.com/openjdk/jdk/pull/21944 which updated Loop Unrolling to use a new predicate visitor and enables this patch. >> >> In Loop Unrolling, we only update the stride and not the init value of a loop. Thus, we actually only require to update the Last Value Assertion Predicates because the Init Value Assertion Predicates do not use `OpaqueLoopStride`. So, we also would not be required to kill the old Init Value Initialized Assertion Predicates. This patch implements that improvement. >> >> To make this work, we need to query the associated `AssertionPredicateType` of an Assertion Predicate which is stored in the `If/RangeCheckNode` (I guess it's okay to have this additional node field in product). This was guarded by `NOT_PRODUCT` before. For this patch, I make this information available in product builds and use it to implement this feature. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Fix after merge Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21969#pullrequestreview-2429031032 From mli at openjdk.org Tue Nov 12 09:30:14 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 12 Nov 2024 09:30:14 GMT Subject: RFR: 8343964: RISC-V: Improve PrintOptoAssembly output for loadNKlassCompactHeaders node In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 02:55:48 GMT, Fei Yang wrote: > Hi, please review this small change. > > Currently, we print a simple `lwu` for this node, which is not accurate becasue we do a `ld` and logic shift right the loaded 64-bit value for this node. This simply changed it into `load_narrow_klass_compact` like other CPU platforms. After this change, we have: > > > 070 B2: # out( B8 B3 ) <- in( B1 ) Freq: 0.9 > 070 + load_narrow_klass_compact R28, [R12, #4] # compressed class ptr, #@loadNKlassCompactHeaders > > > (Tagging: @Hamlin-Li) Looks good to me. Thanks! ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22025#pullrequestreview-2429039168 From mli at openjdk.org Tue Nov 12 09:35:39 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 12 Nov 2024 09:35:39 GMT Subject: RFR: 8343805: RISC-V: JVM crashes on startup when disabling compressed instructions [v2] In-Reply-To: References: <0MGF-2vv_88fCBAwbnHUgug-nizGHUDJNCZHKOwGjSU=.d9b3fa18-7f5f-4231-ab5f-26504becf025@github.com> Message-ID: On Tue, 12 Nov 2024 09:28:33 GMT, Hamlin Li wrote: >> Fei Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> Add more space for hardware platforms with vector extension > > src/hotspot/cpu/riscv/stubRoutines_riscv.hpp line 42: > >> 40: _initial_stubs_code_size = 10000, >> 41: _continuation_stubs_code_size = 2000, >> 42: _compiler_stubs_code_size = 45000, > > Hey, why do we remove the `ZGC_ONLY` here? Seems to me it could trigger the similar issue unexpectedly, because for now G1 is still the default one, developers could only test default one before push their code? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21966#discussion_r1837766886 From mli at openjdk.org Tue Nov 12 09:35:38 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 12 Nov 2024 09:35:38 GMT Subject: RFR: 8343805: RISC-V: JVM crashes on startup when disabling compressed instructions [v2] In-Reply-To: References: <0MGF-2vv_88fCBAwbnHUgug-nizGHUDJNCZHKOwGjSU=.d9b3fa18-7f5f-4231-ab5f-26504becf025@github.com> Message-ID: On Tue, 12 Nov 2024 06:44:26 GMT, Fei Yang wrote: >> Hello, please review this trivial change. >> >> The reason of the crash is that we will use more space for compiler stubs during stubRoutines generation when compressed instructions is disabled expecially when the CPU is not equipped with the RISC-V B extension. So this simply increases the reserved size of compiler stubs for this CPU platform. After this change, we have (without B extension): >> >> >> $ java -Xlog:stubs -XX:-UseRVC -version >> [0.010s][info][stubs] StubRoutines (initial stubs) [0x0000003f8f3cf340, 0x0000003f8f3d1cd0] used: 604, free: 10036 >> [0.117s][info][stubs] StubRoutines (continuation stubs) [0x0000003f8f3d25c0, 0x0000003f8f3d3010] used: 628, free: 2012 >> [0.153s][info][stubs] StubRoutines (final stubs) [0x0000003f8f4025c0, 0x0000003f8f409d70] used: 9380, free: 21260 >> [0.199s][info][stubs] StubRoutines (compiler stubs) [0x0000003f8f4d7c40, 0x0000003f8f4e3180] used: 38924, free: 7476 >> >> >> (PS: Same issue also triggers when building without ZGC (`--disable-jvm-feature-zgc`)) > > Fei Yang has updated the pull request incrementally with one additional commit since the last revision: > > Add more space for hardware platforms with vector extension Thanks for catching and fix. Just one minor comment. src/hotspot/cpu/riscv/stubRoutines_riscv.hpp line 42: > 40: _initial_stubs_code_size = 10000, > 41: _continuation_stubs_code_size = 2000, > 42: _compiler_stubs_code_size = 45000, Hey, why do we remove the `ZGC_ONLY` here? ------------- PR Review: https://git.openjdk.org/jdk/pull/21966#pullrequestreview-2429041244 PR Review Comment: https://git.openjdk.org/jdk/pull/21966#discussion_r1837761273 From fyang at openjdk.org Tue Nov 12 09:47:22 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 12 Nov 2024 09:47:22 GMT Subject: RFR: 8343805: RISC-V: JVM crashes on startup when disabling compressed instructions [v2] In-Reply-To: References: <0MGF-2vv_88fCBAwbnHUgug-nizGHUDJNCZHKOwGjSU=.d9b3fa18-7f5f-4231-ab5f-26504becf025@github.com> Message-ID: On Tue, 12 Nov 2024 09:31:47 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/stubRoutines_riscv.hpp line 42: >> >>> 40: _initial_stubs_code_size = 10000, >>> 41: _continuation_stubs_code_size = 2000, >>> 42: _compiler_stubs_code_size = 45000, >> >> Hey, why do we remove the `ZGC_ONLY` here? > > Seems to me it could trigger the similar issue unexpectedly, because for now G1 is still the default one, developers could only test default one before push their code? Yeah, I removed the `ZGC_ONLY` check as I think it doesn't seem necessary here. I simply did two jdk builds with and without the ZGC feature configured and compared the used compiler stubs from the log output. I witnessed no difference. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21966#discussion_r1837785528 From chagedorn at openjdk.org Tue Nov 12 10:12:04 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 12 Nov 2024 10:12:04 GMT Subject: RFR: 8343944: C2: MinLNode::add_ring() computes _widen wrongly leading to an endless widening/compilation Message-ID: <0RLXKFCTyWLfTDOcGzEd8PdDIzXMhD2VvFOSFNWUOM4=.70b9d267-127e-44bf-9382-9c4856ee5edf@github.com> In `MinLNode::add_ring()`, we wrongly take the minimum of the `_widen`of both input types instead of the maximum which leads to an endless widening in CCP without reaching a fixed point with the test case. We eventually hit the memlimit because we keep creating new types endlessly. The fix is straight forward to use `MAX2()` instead of `MIN2()` as we are already doing for `MinINode::add_ring()`: https://github.com/openjdk/jdk/blob/b53ee053f7f7ffcf02ff47e1895ce7be4bc32486/src/hotspot/share/opto/addnode.cpp#L1437-L1443 Details about how this endless widening is happening are provided as comments in the test case. Thanks, Christian ------------- Commit messages: - 8343944: C2: MinLNode::add_ring() computes _widen wrongly leading to an endless compilation Changes: https://git.openjdk.org/jdk/pull/22033/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22033&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343944 Stats: 81 lines in 2 files changed: 80 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22033.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22033/head:pull/22033 PR: https://git.openjdk.org/jdk/pull/22033 From mli at openjdk.org Tue Nov 12 10:14:07 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 12 Nov 2024 10:14:07 GMT Subject: RFR: 8343805: RISC-V: JVM crashes on startup when disabling compressed instructions [v2] In-Reply-To: References: <0MGF-2vv_88fCBAwbnHUgug-nizGHUDJNCZHKOwGjSU=.d9b3fa18-7f5f-4231-ab5f-26504becf025@github.com> Message-ID: On Tue, 12 Nov 2024 06:44:26 GMT, Fei Yang wrote: >> Hello, please review this trivial change. >> >> The reason of the crash is that we will use more space for compiler stubs during stubRoutines generation when compressed instructions is disabled expecially when the CPU is not equipped with the RISC-V B extension. So this simply increases the reserved size of compiler stubs for this CPU platform. After this change, we have (without B extension): >> >> >> $ java -Xlog:stubs -XX:-UseRVC -version >> [0.010s][info][stubs] StubRoutines (initial stubs) [0x0000003f8f3cf340, 0x0000003f8f3d1cd0] used: 604, free: 10036 >> [0.117s][info][stubs] StubRoutines (continuation stubs) [0x0000003f8f3d25c0, 0x0000003f8f3d3010] used: 628, free: 2012 >> [0.153s][info][stubs] StubRoutines (final stubs) [0x0000003f8f4025c0, 0x0000003f8f409d70] used: 9380, free: 21260 >> [0.199s][info][stubs] StubRoutines (compiler stubs) [0x0000003f8f4d7c40, 0x0000003f8f4e3180] used: 38924, free: 7476 >> >> >> (PS: Same issue also triggers when building without ZGC (`--disable-jvm-feature-zgc`)) > > Fei Yang has updated the pull request incrementally with one additional commit since the last revision: > > Add more space for hardware platforms with vector extension Looks good, Thanks! ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21966#pullrequestreview-2429149280 From mli at openjdk.org Tue Nov 12 10:14:07 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 12 Nov 2024 10:14:07 GMT Subject: RFR: 8343805: RISC-V: JVM crashes on startup when disabling compressed instructions [v2] In-Reply-To: References: <0MGF-2vv_88fCBAwbnHUgug-nizGHUDJNCZHKOwGjSU=.d9b3fa18-7f5f-4231-ab5f-26504becf025@github.com> Message-ID: On Tue, 12 Nov 2024 09:43:34 GMT, Fei Yang wrote: >> Seems to me it could trigger the similar issue unexpectedly, because for now G1 is still the default one, developers could only test default one before push their code? > > Yeah, I removed the `ZGC_ONLY` check as I think it doesn't seem necessary here. I simply did two jdk builds with and without the ZGC feature configured and compared the used compiler stubs from the log output. I witnessed no difference. OK, I think for now it's safe, I only found below code in the stub generator related to UseZGC, and it's for final stubs: // The size of copy32_loop body increases significantly with ZGC GC barriers. // Need conditional far branches to reach a point beyond the loop in this case. bool is_far = UseZGC; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21966#discussion_r1837827471 From chagedorn at openjdk.org Tue Nov 12 10:14:11 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 12 Nov 2024 10:14:11 GMT Subject: RFR: 8343745: Only update Last Value Assertion Predicates in Loop Unrolling [v4] In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 06:57:37 GMT, Christian Hagedorn wrote: >> (Note: This is a dependent PR on https://github.com/openjdk/jdk/pull/21944 which is fully reviewed but I'd prefer to integrate it on Monday and run again some testing over the weekend) >> >> This is a follow-up feature to https://github.com/openjdk/jdk/pull/21944 which updated Loop Unrolling to use a new predicate visitor and enables this patch. >> >> In Loop Unrolling, we only update the stride and not the init value of a loop. Thus, we actually only require to update the Last Value Assertion Predicates because the Init Value Assertion Predicates do not use `OpaqueLoopStride`. So, we also would not be required to kill the old Init Value Initialized Assertion Predicates. This patch implements that improvement. >> >> To make this work, we need to query the associated `AssertionPredicateType` of an Assertion Predicate which is stored in the `If/RangeCheckNode` (I guess it's okay to have this additional node field in product). This was guarded by `NOT_PRODUCT` before. For this patch, I make this information available in product builds and use it to implement this feature. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Fix after merge Thanks Tobias for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21969#issuecomment-2470119152 From chagedorn at openjdk.org Tue Nov 12 10:14:11 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 12 Nov 2024 10:14:11 GMT Subject: Integrated: 8343745: Only update Last Value Assertion Predicates in Loop Unrolling In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 07:12:12 GMT, Christian Hagedorn wrote: > (Note: This is a dependent PR on https://github.com/openjdk/jdk/pull/21944 which is fully reviewed but I'd prefer to integrate it on Monday and run again some testing over the weekend) > > This is a follow-up feature to https://github.com/openjdk/jdk/pull/21944 which updated Loop Unrolling to use a new predicate visitor and enables this patch. > > In Loop Unrolling, we only update the stride and not the init value of a loop. Thus, we actually only require to update the Last Value Assertion Predicates because the Init Value Assertion Predicates do not use `OpaqueLoopStride`. So, we also would not be required to kill the old Init Value Initialized Assertion Predicates. This patch implements that improvement. > > To make this work, we need to query the associated `AssertionPredicateType` of an Assertion Predicate which is stored in the `If/RangeCheckNode` (I guess it's okay to have this additional node field in product). This was guarded by `NOT_PRODUCT` before. For this patch, I make this information available in product builds and use it to implement this feature. > > Thanks, > Christian This pull request has now been integrated. Changeset: 3727f404 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/3727f4046188bb623f9efec6fa149f767a9ffa30 Stats: 101 lines in 7 files changed: 16 ins; 13 del; 72 mod 8343745: Only update Last Value Assertion Predicates in Loop Unrolling Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21969 From thartmann at openjdk.org Tue Nov 12 10:26:56 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 12 Nov 2024 10:26:56 GMT Subject: RFR: 8344018: Remove unlimited memory setting from TestScalarReplacementMaxLiveNodes Message-ID: <6fpouEC9uo00iQIEwYKDvcftT6IIrwGQZBssvZYe95w=.2ed5adc7-74e1-4c96-96ae-31b3c710e31c@github.com> [JDK-8342612](https://bugs.openjdk.org/browse/JDK-8342612) increased the memory limit as a workaround for [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295) which was found to be a separate issue ([JDK-8343038](https://bugs.openjdk.org/browse/JDK-8343038)). Let's remove the setting now that [JDK-8343038](https://bugs.openjdk.org/browse/JDK-8343038) got resolved by [JDK-8340824](https://bugs.openjdk.org/browse/JDK-8340824). Thanks, Tobias ------------- Commit messages: - 8344018: Remove unlimited memory setting from TestScalarReplacementMaxLiveNodes Changes: https://git.openjdk.org/jdk/pull/22034/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22034&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344018 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22034.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22034/head:pull/22034 PR: https://git.openjdk.org/jdk/pull/22034 From thartmann at openjdk.org Tue Nov 12 10:34:00 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 12 Nov 2024 10:34:00 GMT Subject: RFR: 8343944: C2: MinLNode::add_ring() computes _widen wrongly leading to an endless widening/compilation In-Reply-To: <0RLXKFCTyWLfTDOcGzEd8PdDIzXMhD2VvFOSFNWUOM4=.70b9d267-127e-44bf-9382-9c4856ee5edf@github.com> References: <0RLXKFCTyWLfTDOcGzEd8PdDIzXMhD2VvFOSFNWUOM4=.70b9d267-127e-44bf-9382-9c4856ee5edf@github.com> Message-ID: On Tue, 12 Nov 2024 10:07:24 GMT, Christian Hagedorn wrote: > In `MinLNode::add_ring()`, we wrongly take the minimum of the `_widen`of both input types instead of the maximum which leads to an endless widening in CCP without reaching a fixed point with the test case. We eventually hit the memlimit because we keep creating new types endlessly. > > The fix is straight forward to use `MAX2()` instead of `MIN2()` as we are already doing for `MinINode::add_ring()`: > https://github.com/openjdk/jdk/blob/b53ee053f7f7ffcf02ff47e1895ce7be4bc32486/src/hotspot/share/opto/addnode.cpp#L1437-L1443 > > Details about how this endless widening is happening are provided as comments in the test case. > > Thanks, > Christian Good catch! ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22033#pullrequestreview-2429198930 From roland at openjdk.org Tue Nov 12 10:34:30 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 12 Nov 2024 10:34:30 GMT Subject: RFR: 8344018: Remove unlimited memory setting from TestScalarReplacementMaxLiveNodes In-Reply-To: <6fpouEC9uo00iQIEwYKDvcftT6IIrwGQZBssvZYe95w=.2ed5adc7-74e1-4c96-96ae-31b3c710e31c@github.com> References: <6fpouEC9uo00iQIEwYKDvcftT6IIrwGQZBssvZYe95w=.2ed5adc7-74e1-4c96-96ae-31b3c710e31c@github.com> Message-ID: On Tue, 12 Nov 2024 10:22:16 GMT, Tobias Hartmann wrote: > [JDK-8342612](https://bugs.openjdk.org/browse/JDK-8342612) increased the memory limit as a workaround for [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295) which was found to be a separate issue ([JDK-8343038](https://bugs.openjdk.org/browse/JDK-8343038)). Let's remove the setting now that [JDK-8343038](https://bugs.openjdk.org/browse/JDK-8343038) got resolved by [JDK-8340824](https://bugs.openjdk.org/browse/JDK-8340824). > > Thanks, > Tobias Looks good and trivial to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22034#pullrequestreview-2429203557 From thartmann at openjdk.org Tue Nov 12 10:48:20 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 12 Nov 2024 10:48:20 GMT Subject: RFR: 8344018: Remove unlimited memory setting from TestScalarReplacementMaxLiveNodes In-Reply-To: <6fpouEC9uo00iQIEwYKDvcftT6IIrwGQZBssvZYe95w=.2ed5adc7-74e1-4c96-96ae-31b3c710e31c@github.com> References: <6fpouEC9uo00iQIEwYKDvcftT6IIrwGQZBssvZYe95w=.2ed5adc7-74e1-4c96-96ae-31b3c710e31c@github.com> Message-ID: On Tue, 12 Nov 2024 10:22:16 GMT, Tobias Hartmann wrote: > [JDK-8342612](https://bugs.openjdk.org/browse/JDK-8342612) increased the memory limit as a workaround for [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295) which was found to be a separate issue ([JDK-8343038](https://bugs.openjdk.org/browse/JDK-8343038)). Let's remove the setting now that [JDK-8343038](https://bugs.openjdk.org/browse/JDK-8343038) got resolved by [JDK-8340824](https://bugs.openjdk.org/browse/JDK-8340824). > > Thanks, > Tobias Thanks Roland. I'll integrate this when testing finished. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22034#issuecomment-2470194485 From chagedorn at openjdk.org Tue Nov 12 10:52:55 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 12 Nov 2024 10:52:55 GMT Subject: RFR: 8343944: C2: MinLNode::add_ring() computes _widen wrongly leading to an endless widening/compilation In-Reply-To: <0RLXKFCTyWLfTDOcGzEd8PdDIzXMhD2VvFOSFNWUOM4=.70b9d267-127e-44bf-9382-9c4856ee5edf@github.com> References: <0RLXKFCTyWLfTDOcGzEd8PdDIzXMhD2VvFOSFNWUOM4=.70b9d267-127e-44bf-9382-9c4856ee5edf@github.com> Message-ID: On Tue, 12 Nov 2024 10:07:24 GMT, Christian Hagedorn wrote: > In `MinLNode::add_ring()`, we wrongly take the minimum of the `_widen`of both input types instead of the maximum which leads to an endless widening in CCP without reaching a fixed point with the test case. We eventually hit the memlimit because we keep creating new types endlessly. > > The fix is straight forward to use `MAX2()` instead of `MIN2()` as we are already doing for `MinINode::add_ring()`: > https://github.com/openjdk/jdk/blob/b53ee053f7f7ffcf02ff47e1895ce7be4bc32486/src/hotspot/share/opto/addnode.cpp#L1437-L1443 > > Details about how this endless widening is happening are provided as comments in the test case. > > Thanks, > Christian Thanks Tobias! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22033#issuecomment-2470204011 From chagedorn at openjdk.org Tue Nov 12 11:02:30 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 12 Nov 2024 11:02:30 GMT Subject: RFR: 8344018: Remove unlimited memory setting from TestScalarReplacementMaxLiveNodes In-Reply-To: <6fpouEC9uo00iQIEwYKDvcftT6IIrwGQZBssvZYe95w=.2ed5adc7-74e1-4c96-96ae-31b3c710e31c@github.com> References: <6fpouEC9uo00iQIEwYKDvcftT6IIrwGQZBssvZYe95w=.2ed5adc7-74e1-4c96-96ae-31b3c710e31c@github.com> Message-ID: <8GUS7Y3V5IZdeV1qK3y6C25fgIl4gBpUobiGw1KPo34=.948bd499-1b45-46ed-a04d-47184d6928ca@github.com> On Tue, 12 Nov 2024 10:22:16 GMT, Tobias Hartmann wrote: > [JDK-8342612](https://bugs.openjdk.org/browse/JDK-8342612) increased the memory limit as a workaround for [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295) which was found to be a separate issue ([JDK-8343038](https://bugs.openjdk.org/browse/JDK-8343038)). Let's remove the setting now that [JDK-8343038](https://bugs.openjdk.org/browse/JDK-8343038) got resolved by [JDK-8340824](https://bugs.openjdk.org/browse/JDK-8340824). > > Thanks, > Tobias Looks good. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22034#pullrequestreview-2429268446 From rcastanedalo at openjdk.org Tue Nov 12 11:55:09 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 12 Nov 2024 11:55:09 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps [v2] In-Reply-To: References: Message-ID: <61U1yfefGGXjAMffpUIz5nGvsmWUxoAGoOkY8hN1mHs=.a7ae0994-4c0d-42c1-afb5-f719630ddb61@github.com> > This changeset dumps C2's low-level intermediate representation at the following intermediate register allocation points: > - Initial liveness: after initial liveness information is computed. > - Aggressive coalescing: after aggressively coalescing live ranges and destructing SSA. > - Initial spilling: after initial round of spilling derived from physical interference graph construction. > - Conservative coalescing: after each round of conservative (colorability-preserving) coalescing (if `OptoCoalesce` is enabled). > - Iterative spilling: after each round of spilling. > - After iterative spilling: after the main register allocation loop. > - Post-allocation copy removal: after peephole copy removal. > - Merge multiple definitions: after local merging of equivalent nodes related by the same live range. > - Fix up spills: convert load-store spills into memory operand accesses ("CISC spilling") if allowed by the target platform and `UseCISCSpill` is enabled. > > The new dumps have already proved to be useful in the investigation of [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295). > > #### Testing > > - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64). > - Tested automatically that dumping, scheduling, and viewing hundreds of the new graphs does not trigger any failure on HotSpot or IGV (by instrumenting IGV to schedule and view graphs eagerly and running `java -Xbatch -XX:PrintIdealGraphLevel=4`). Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Split FIXUP_SPILLS ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22017/files - new: https://git.openjdk.org/jdk/pull/22017/files/90f9a24e..e44fa796 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22017&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22017&range=00-01 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/22017.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22017/head:pull/22017 PR: https://git.openjdk.org/jdk/pull/22017 From rcastanedalo at openjdk.org Tue Nov 12 11:55:09 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 12 Nov 2024 11:55:09 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps [v2] In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 08:26:26 GMT, Christian Hagedorn wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Split FIXUP_SPILLS > > src/hotspot/share/opto/phasetype.hpp line 104: > >> 102: flags(POST_ALLOCATION_COPY_REMOVAL, "Post-allocation copy removal") \ >> 103: flags(MERGE_MULTIDEFS, "Merge multiple definitions") \ >> 104: flags(FIXUP_SPILLS, "Fix up spills") \ > > Should we split at the word boundary? > Suggestion: > > flags(FIX_UP_SPILLS, "Fix up spills") \ Thanks, done in commit e44fa796. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22017#discussion_r1837972013 From rcastanedalo at openjdk.org Tue Nov 12 12:02:42 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 12 Nov 2024 12:02:42 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps [v2] In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 08:27:45 GMT, Christian Hagedorn wrote: > Just an idea, since you've provided a nice description for each phase in the PR description, should we add these in phasetype.hpp at the phases? I tried this out but could not find a good way to interleave code comments and `flags` entries (only using multi-line comments with additional backslashes, which looks too convoluted in my opinion). ------------- PR Comment: https://git.openjdk.org/jdk/pull/22017#issuecomment-2470345440 From rcastanedalo at openjdk.org Tue Nov 12 12:13:22 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 12 Nov 2024 12:13:22 GMT Subject: RFR: 8343944: C2: MinLNode::add_ring() computes _widen wrongly leading to an endless widening/compilation In-Reply-To: <0RLXKFCTyWLfTDOcGzEd8PdDIzXMhD2VvFOSFNWUOM4=.70b9d267-127e-44bf-9382-9c4856ee5edf@github.com> References: <0RLXKFCTyWLfTDOcGzEd8PdDIzXMhD2VvFOSFNWUOM4=.70b9d267-127e-44bf-9382-9c4856ee5edf@github.com> Message-ID: On Tue, 12 Nov 2024 10:07:24 GMT, Christian Hagedorn wrote: > In `MinLNode::add_ring()`, we wrongly take the minimum of the `_widen`of both input types instead of the maximum which leads to an endless widening in CCP without reaching a fixed point with the test case. We eventually hit the memlimit because we keep creating new types endlessly. > > The fix is straight forward to use `MAX2()` instead of `MIN2()` as we are already doing for `MinINode::add_ring()`: > https://github.com/openjdk/jdk/blob/b53ee053f7f7ffcf02ff47e1895ce7be4bc32486/src/hotspot/share/opto/addnode.cpp#L1437-L1443 > > Details about how this endless widening is happening are provided as comments in the test case. > > Thanks, > Christian Looks good! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22033#pullrequestreview-2429423589 From chagedorn at openjdk.org Tue Nov 12 12:18:39 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 12 Nov 2024 12:18:39 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps [v2] In-Reply-To: <61U1yfefGGXjAMffpUIz5nGvsmWUxoAGoOkY8hN1mHs=.a7ae0994-4c0d-42c1-afb5-f719630ddb61@github.com> References: <61U1yfefGGXjAMffpUIz5nGvsmWUxoAGoOkY8hN1mHs=.a7ae0994-4c0d-42c1-afb5-f719630ddb61@github.com> Message-ID: On Tue, 12 Nov 2024 11:55:09 GMT, Roberto Casta?eda Lozano wrote: >> This changeset dumps C2's low-level intermediate representation at the following intermediate register allocation points: >> - Initial liveness: after initial liveness information is computed. >> - Aggressive coalescing: after aggressively coalescing live ranges and destructing SSA. >> - Initial spilling: after initial round of spilling derived from physical interference graph construction. >> - Conservative coalescing: after each round of conservative (colorability-preserving) coalescing (if `OptoCoalesce` is enabled). >> - Iterative spilling: after each round of spilling. >> - After iterative spilling: after the main register allocation loop. >> - Post-allocation copy removal: after peephole copy removal. >> - Merge multiple definitions: after local merging of equivalent nodes related by the same live range. >> - Fix up spills: convert load-store spills into memory operand accesses ("CISC spilling") if allowed by the target platform and `UseCISCSpill` is enabled. >> >> The new dumps have already proved to be useful in the investigation of [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295). >> >> #### Testing >> >> - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64). >> - Tested automatically that dumping, scheduling, and viewing hundreds of the new graphs does not trigger any failure on HotSpot or IGV (by instrumenting IGV to schedule and view graphs eagerly and running `java -Xbatch -XX:PrintIdealGraphLevel=4`). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Split FIXUP_SPILLS Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22017#pullrequestreview-2429432836 From chagedorn at openjdk.org Tue Nov 12 12:18:40 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 12 Nov 2024 12:18:40 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps [v2] In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 11:59:28 GMT, Roberto Casta?eda Lozano wrote: > > Just an idea, since you've provided a nice description for each phase in the PR description, should we add these in phasetype.hpp at the phases? > > I tried this out but could not find a good way to interleave code comments and `flags` entries (only using multi-line comments with additional backslashes, which looks too convoluted in my opinion). I see, that does not seem to be straight forward. I guess then it's okay to omit these descriptions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22017#issuecomment-2470376497 From chagedorn at openjdk.org Tue Nov 12 12:26:42 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 12 Nov 2024 12:26:42 GMT Subject: RFR: 8343944: C2: MinLNode::add_ring() computes _widen wrongly leading to an endless widening/compilation In-Reply-To: <0RLXKFCTyWLfTDOcGzEd8PdDIzXMhD2VvFOSFNWUOM4=.70b9d267-127e-44bf-9382-9c4856ee5edf@github.com> References: <0RLXKFCTyWLfTDOcGzEd8PdDIzXMhD2VvFOSFNWUOM4=.70b9d267-127e-44bf-9382-9c4856ee5edf@github.com> Message-ID: On Tue, 12 Nov 2024 10:07:24 GMT, Christian Hagedorn wrote: > In `MinLNode::add_ring()`, we wrongly take the minimum of the `_widen`of both input types instead of the maximum which leads to an endless widening in CCP without reaching a fixed point with the test case. We eventually hit the memlimit because we keep creating new types endlessly. > > The fix is straight forward to use `MAX2()` instead of `MIN2()` as we are already doing for `MinINode::add_ring()`: > https://github.com/openjdk/jdk/blob/b53ee053f7f7ffcf02ff47e1895ce7be4bc32486/src/hotspot/share/opto/addnode.cpp#L1437-L1443 > > Details about how this endless widening is happening are provided as comments in the test case. > > Thanks, > Christian Thanks Roberto for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22033#issuecomment-2470392404 From rcastanedalo at openjdk.org Tue Nov 12 12:30:20 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 12 Nov 2024 12:30:20 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps [v2] In-Reply-To: <61U1yfefGGXjAMffpUIz5nGvsmWUxoAGoOkY8hN1mHs=.a7ae0994-4c0d-42c1-afb5-f719630ddb61@github.com> References: <61U1yfefGGXjAMffpUIz5nGvsmWUxoAGoOkY8hN1mHs=.a7ae0994-4c0d-42c1-afb5-f719630ddb61@github.com> Message-ID: On Tue, 12 Nov 2024 11:55:09 GMT, Roberto Casta?eda Lozano wrote: >> This changeset dumps C2's low-level intermediate representation at the following intermediate register allocation points: >> - Initial liveness: after initial liveness information is computed. >> - Aggressive coalescing: after aggressively coalescing live ranges and destructing SSA. >> - Initial spilling: after initial round of spilling derived from physical interference graph construction. >> - Conservative coalescing: after each round of conservative (colorability-preserving) coalescing (if `OptoCoalesce` is enabled). >> - Iterative spilling: after each round of spilling. >> - After iterative spilling: after the main register allocation loop. >> - Post-allocation copy removal: after peephole copy removal. >> - Merge multiple definitions: after local merging of equivalent nodes related by the same live range. >> - Fix up spills: convert load-store spills into memory operand accesses ("CISC spilling") if allowed by the target platform and `UseCISCSpill` is enabled. >> >> The new dumps have already proved to be useful in the investigation of [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295). >> >> #### Testing >> >> - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64). >> - Tested automatically that dumping, scheduling, and viewing hundreds of the new graphs does not trigger any failure on HotSpot or IGV (by instrumenting IGV to schedule and view graphs eagerly and running `java -Xbatch -XX:PrintIdealGraphLevel=4`). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Split FIXUP_SPILLS Thanks Daniel and Christian for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22017#issuecomment-2470401879 From galder at openjdk.org Tue Nov 12 12:36:39 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 12 Nov 2024 12:36:39 GMT Subject: RFR: 8326369: Bimorphic inlining not applied at a call site that was initially monomorphic [v2] In-Reply-To: References: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> <14vSVV6FCG5GxZjE2heaPLpQZGTCB9xWB075R_bz_SA=.e3f8bcdd-5060-4a3c-9918-cd515e23feea@github.com> Message-ID: On Tue, 12 Nov 2024 12:31:52 GMT, Galder Zamarre?o wrote: >> Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: >> >> Added copyright and @bug identifiers > > macos-aarch64 CI failed with, is this transitory or something needs fixing? > > > xcode-select: error: invalid developer directory '/Applications/Xcode_14.3.1.app/Contents/Developer' > @galderz, I'd appreciate it if you can add `Copyright (c) 2024 JetBrains s.r.o.. All rights reserved.` to the header. Thanks! Just pushed a commit to add that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21920#issuecomment-2470412553 From galder at openjdk.org Tue Nov 12 12:36:38 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 12 Nov 2024 12:36:38 GMT Subject: RFR: 8326369: Bimorphic inlining not applied at a call site that was initially monomorphic [v3] In-Reply-To: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> References: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> Message-ID: > This issue was fixed by JDK-8339299 indirectly (see [PR](https://github.com/openjdk/jdk/pull/20786)), but the PR for 8339299 didn't include any new tests, so I'm sending this PR with the test that Filipp added in 8326369 so that such issue doesn't come back inadvertently. I've added him as co-author and applied some small formatting changes. Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: Added Jetbrains copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21920/files - new: https://git.openjdk.org/jdk/pull/21920/files/1bf6992c..9d9909f8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21920&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21920&range=01-02 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21920.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21920/head:pull/21920 PR: https://git.openjdk.org/jdk/pull/21920 From galder at openjdk.org Tue Nov 12 12:36:39 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 12 Nov 2024 12:36:39 GMT Subject: RFR: 8326369: Bimorphic inlining not applied at a call site that was initially monomorphic [v2] In-Reply-To: <14vSVV6FCG5GxZjE2heaPLpQZGTCB9xWB075R_bz_SA=.e3f8bcdd-5060-4a3c-9918-cd515e23feea@github.com> References: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> <14vSVV6FCG5GxZjE2heaPLpQZGTCB9xWB075R_bz_SA=.e3f8bcdd-5060-4a3c-9918-cd515e23feea@github.com> Message-ID: On Thu, 7 Nov 2024 10:50:19 GMT, Galder Zamarre?o wrote: >> This issue was fixed by JDK-8339299 indirectly (see [PR](https://github.com/openjdk/jdk/pull/20786)), but the PR for 8339299 didn't include any new tests, so I'm sending this PR with the test that Filipp added in 8326369 so that such issue doesn't come back inadvertently. I've added him as co-author and applied some small formatting changes. > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Added copyright and @bug identifiers macos-aarch64 CI failed with, is this transitory or something needs fixing? xcode-select: error: invalid developer directory '/Applications/Xcode_14.3.1.app/Contents/Developer' ------------- PR Comment: https://git.openjdk.org/jdk/pull/21920#issuecomment-2470411444 From thartmann at openjdk.org Tue Nov 12 12:45:16 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 12 Nov 2024 12:45:16 GMT Subject: RFR: 8344018: Remove unlimited memory setting from TestScalarReplacementMaxLiveNodes In-Reply-To: <6fpouEC9uo00iQIEwYKDvcftT6IIrwGQZBssvZYe95w=.2ed5adc7-74e1-4c96-96ae-31b3c710e31c@github.com> References: <6fpouEC9uo00iQIEwYKDvcftT6IIrwGQZBssvZYe95w=.2ed5adc7-74e1-4c96-96ae-31b3c710e31c@github.com> Message-ID: On Tue, 12 Nov 2024 10:22:16 GMT, Tobias Hartmann wrote: > [JDK-8342612](https://bugs.openjdk.org/browse/JDK-8342612) increased the memory limit as a workaround for [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295) which was found to be a separate issue ([JDK-8343038](https://bugs.openjdk.org/browse/JDK-8343038)). Let's remove the setting now that [JDK-8343038](https://bugs.openjdk.org/browse/JDK-8343038) got resolved by [JDK-8340824](https://bugs.openjdk.org/browse/JDK-8340824). > > Thanks, > Tobias Thanks for the review Christian. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22034#issuecomment-2470431373 From thartmann at openjdk.org Tue Nov 12 12:45:16 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 12 Nov 2024 12:45:16 GMT Subject: Integrated: 8344018: Remove unlimited memory setting from TestScalarReplacementMaxLiveNodes In-Reply-To: <6fpouEC9uo00iQIEwYKDvcftT6IIrwGQZBssvZYe95w=.2ed5adc7-74e1-4c96-96ae-31b3c710e31c@github.com> References: <6fpouEC9uo00iQIEwYKDvcftT6IIrwGQZBssvZYe95w=.2ed5adc7-74e1-4c96-96ae-31b3c710e31c@github.com> Message-ID: On Tue, 12 Nov 2024 10:22:16 GMT, Tobias Hartmann wrote: > [JDK-8342612](https://bugs.openjdk.org/browse/JDK-8342612) increased the memory limit as a workaround for [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295) which was found to be a separate issue ([JDK-8343038](https://bugs.openjdk.org/browse/JDK-8343038)). Let's remove the setting now that [JDK-8343038](https://bugs.openjdk.org/browse/JDK-8343038) got resolved by [JDK-8340824](https://bugs.openjdk.org/browse/JDK-8340824). > > Thanks, > Tobias This pull request has now been integrated. Changeset: 67d1ef14 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/67d1ef14798be5dbd083ba23b9e3ae8e80f72728 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod 8344018: Remove unlimited memory setting from TestScalarReplacementMaxLiveNodes Reviewed-by: roland, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/22034 From rcastanedalo at openjdk.org Tue Nov 12 13:37:16 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 12 Nov 2024 13:37:16 GMT Subject: RFR: 8337660: C2: basic blocks with only BoxLock nodes are wrongly treated as empty Message-ID: This changeset takes into account the presence of `BoxLock` nodes in a basic block when determining whether the block is empty and [can be removed](https://github.com/openjdk/jdk/blob/5729227651969f542f040e5d0bfbf9b0b99b5698/src/hotspot/share/opto/compile.cpp#L2997). Special treatment of `BoxLock` nodes is required because these are not Mach nodes, yet they [are preserved in C2's back-end](https://github.com/openjdk/jdk/blob/f0b251d76078e8d5b47e967b0449c4cbdcb5a005/src/hotspot/share/opto/matcher.cpp#L2278) and result in [actual machine code being generated](https://github.com/openjdk/jdk/blob/f0b251d76078e8d5b47e967b0449c4cbdcb5a005/src/hotspot/cpu/x86/x86_64.ad#L1544). The proposed change avoids wrongly removing basic blocks consisting only of `BoxLock` and other non-Mach nodes, and crashing when the register that should have been defined by the wrongly removed `BoxLock` node is used (see complete failure analysis in the [JBS description](https://bugs.openjdk.org/browse/JDK-8337660) ). #### Testing - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode) ------------- Commit messages: - Take into account BoxLock nodes when determining if a block is empty Changes: https://git.openjdk.org/jdk/pull/22038/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22038&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337660 Stats: 93 lines in 2 files changed: 88 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/22038.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22038/head:pull/22038 PR: https://git.openjdk.org/jdk/pull/22038 From dfenacci at openjdk.org Tue Nov 12 13:45:25 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 12 Nov 2024 13:45:25 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps [v2] In-Reply-To: <61U1yfefGGXjAMffpUIz5nGvsmWUxoAGoOkY8hN1mHs=.a7ae0994-4c0d-42c1-afb5-f719630ddb61@github.com> References: <61U1yfefGGXjAMffpUIz5nGvsmWUxoAGoOkY8hN1mHs=.a7ae0994-4c0d-42c1-afb5-f719630ddb61@github.com> Message-ID: <6hPXv28ApdHBkkBnRwvT-qs1d6a0Jadm7iip5anKPU0=.e669a111-42b9-43e8-b470-fcff12bc2ce8@github.com> On Tue, 12 Nov 2024 11:55:09 GMT, Roberto Casta?eda Lozano wrote: >> This changeset dumps C2's low-level intermediate representation at the following intermediate register allocation points: >> - Initial liveness: after initial liveness information is computed. >> - Aggressive coalescing: after aggressively coalescing live ranges and destructing SSA. >> - Initial spilling: after initial round of spilling derived from physical interference graph construction. >> - Conservative coalescing: after each round of conservative (colorability-preserving) coalescing (if `OptoCoalesce` is enabled). >> - Iterative spilling: after each round of spilling. >> - After iterative spilling: after the main register allocation loop. >> - Post-allocation copy removal: after peephole copy removal. >> - Merge multiple definitions: after local merging of equivalent nodes related by the same live range. >> - Fix up spills: convert load-store spills into memory operand accesses ("CISC spilling") if allowed by the target platform and `UseCISCSpill` is enabled. >> >> The new dumps have already proved to be useful in the investigation of [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295). >> >> #### Testing >> >> - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64). >> - Tested automatically that dumping, scheduling, and viewing hundreds of the new graphs does not trigger any failure on HotSpot or IGV (by instrumenting IGV to schedule and view graphs eagerly and running `java -Xbatch -XX:PrintIdealGraphLevel=4`). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Split FIXUP_SPILLS Very very cool! @robcasloz do you think it could make sense to add a few IR tests just to make sure that the new steps are actually dumped? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22017#issuecomment-2470555444 From dfenacci at openjdk.org Tue Nov 12 13:45:25 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 12 Nov 2024 13:45:25 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps [v2] In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 11:51:31 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/share/opto/phasetype.hpp line 104: >> >>> 102: flags(POST_ALLOCATION_COPY_REMOVAL, "Post-allocation copy removal") \ >>> 103: flags(MERGE_MULTIDEFS, "Merge multiple definitions") \ >>> 104: flags(FIXUP_SPILLS, "Fix up spills") \ >> >> Should we split at the word boundary? >> Suggestion: >> >> flags(FIX_UP_SPILLS, "Fix up spills") \ > > Thanks, done in commit e44fa796. To be consistent I guess the same could be done for `MERGE_MULTIDEFS` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22017#discussion_r1838122312 From stuefe at openjdk.org Tue Nov 12 13:50:15 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 12 Nov 2024 13:50:15 GMT Subject: RFR: 8344014: Simplify TracePhase constructor Message-ID: As a prerequisite for [JDK-8344009](https://bugs.openjdk.org/browse/JDK-8344009), `TracePhase` constructor needs to know the PhaseTraceId. And while we are at it, it can be simplified: trace strings can be kept in via x-macro with the IDs, and it is sufficient to pass in the IDs, no need to pass the pointer to the counters since we use the same counters anyway. Since this is a somewhat invasive but purely mechanical change, I separate this work from [JDK-8344009](https://bugs.openjdk.org/browse/JDK-8344009). There are no functional changes. Trace texts have been faithfully taken over, even in the case where the original TracePhase constructor invocation got fed an empty string (`_t_vector` and `_t_renumberLive`) - whether this was intentional or not, this patch does not change it. The patch preserves the possibility to override the phase name with an explicit argument to the constructor. This is used in one existing case ("computeLive (sbplr)"), again, to faithfully preserve the log format. Test: I checked manually with +CITimeVerbose with and without patch and compared the output; output format is preserved. ------------- Commit messages: - fixes - Rework TracePhase construction Changes: https://git.openjdk.org/jdk/pull/22029/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22029&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344014 Stats: 199 lines in 18 files changed: 75 ins; 53 del; 71 mod Patch: https://git.openjdk.org/jdk/pull/22029.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22029/head:pull/22029 PR: https://git.openjdk.org/jdk/pull/22029 From stuefe at openjdk.org Tue Nov 12 13:50:15 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 12 Nov 2024 13:50:15 GMT Subject: RFR: 8344014: Simplify TracePhase constructor In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 07:56:24 GMT, Thomas Stuefe wrote: > As a prerequisite for [JDK-8344009](https://bugs.openjdk.org/browse/JDK-8344009), `TracePhase` constructor needs to know the PhaseTraceId. And while we are at it, it can be simplified: trace strings can be kept in via x-macro with the IDs, and it is sufficient to pass in the IDs, no need to pass the pointer to the counters since we use the same counters anyway. > > Since this is a somewhat invasive but purely mechanical change, I separate this work from [JDK-8344009](https://bugs.openjdk.org/browse/JDK-8344009). > > There are no functional changes. Trace texts have been faithfully taken over, even in the case where the original TracePhase constructor invocation got fed an empty string (`_t_vector` and `_t_renumberLive`) - whether this was intentional or not, this patch does not change it. > > The patch preserves the possibility to override the phase name with an explicit argument to the constructor. This is used in one existing case ("computeLive (sbplr)"), again, to faithfully preserve the log format. > > Test: I checked manually with +CITimeVerbose with and without patch and compared the output; output format is preserved. Mac OS error unrelated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22029#issuecomment-2470568530 From qamai at openjdk.org Tue Nov 12 14:01:32 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 12 Nov 2024 14:01:32 GMT Subject: RFR: 8337660: C2: basic blocks with only BoxLock nodes are wrongly treated as empty In-Reply-To: References: Message-ID: <6Fw6s8C3ovd8wuJEqp0CmvcjyUg_Ar-avXL_uVTyog4=.3aadfc7e-b92c-44f9-9ecf-cc3572ecf185@github.com> On Tue, 12 Nov 2024 12:37:21 GMT, Roberto Casta?eda Lozano wrote: > This changeset takes into account the presence of `BoxLock` nodes in a basic block when determining whether the block is empty and [can be removed](https://github.com/openjdk/jdk/blob/5729227651969f542f040e5d0bfbf9b0b99b5698/src/hotspot/share/opto/compile.cpp#L2997). Special treatment of `BoxLock` nodes is required because these are not Mach nodes, yet they [are preserved in C2's back-end](https://github.com/openjdk/jdk/blob/f0b251d76078e8d5b47e967b0449c4cbdcb5a005/src/hotspot/share/opto/matcher.cpp#L2278) and result in [actual machine code being generated](https://github.com/openjdk/jdk/blob/f0b251d76078e8d5b47e967b0449c4cbdcb5a005/src/hotspot/cpu/x86/x86_64.ad#L1544). The proposed change avoids wrongly removing basic blocks consisting only of `BoxLock` and other non-Mach nodes, and crashing when the register that should have been defined by the wrongly removed `BoxLock` node is used (see complete failure analysis in the [JBS description](https://bugs.openjdk.org/browse/JDK-833766 0)). > > #### Testing > > - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode) May I ask what's wrong with making `BoxLock` a subclass of `MachNode`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22038#issuecomment-2470602453 From fyang at openjdk.org Tue Nov 12 15:31:11 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 12 Nov 2024 15:31:11 GMT Subject: RFR: 8343805: RISC-V: JVM crashes on startup when disabling compressed instructions [v2] In-Reply-To: References: <0MGF-2vv_88fCBAwbnHUgug-nizGHUDJNCZHKOwGjSU=.d9b3fa18-7f5f-4231-ab5f-26504becf025@github.com> Message-ID: <1yr1z8KFY3b6KRAjF3cwNUUaT368zoxCWi-oU63_pYY=.18d15c0c-fe71-466a-991c-281c8ac1418e@github.com> On Tue, 12 Nov 2024 10:11:09 GMT, Hamlin Li wrote: >> Fei Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> Add more space for hardware platforms with vector extension > > Looks good, Thanks! @Hamlin-Li : Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21966#issuecomment-2470831692 From fyang at openjdk.org Tue Nov 12 15:31:11 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 12 Nov 2024 15:31:11 GMT Subject: Integrated: 8343805: RISC-V: JVM crashes on startup when disabling compressed instructions In-Reply-To: <0MGF-2vv_88fCBAwbnHUgug-nizGHUDJNCZHKOwGjSU=.d9b3fa18-7f5f-4231-ab5f-26504becf025@github.com> References: <0MGF-2vv_88fCBAwbnHUgug-nizGHUDJNCZHKOwGjSU=.d9b3fa18-7f5f-4231-ab5f-26504becf025@github.com> Message-ID: On Fri, 8 Nov 2024 01:54:55 GMT, Fei Yang wrote: > Hello, please review this trivial change. > > The reason of the crash is that we will use more space for compiler stubs during stubRoutines generation when compressed instructions is disabled expecially when the CPU is not equipped with the RISC-V B extension. So this simply increases the reserved size of compiler stubs for this CPU platform. After this change, we have (without B extension): > > > $ java -Xlog:stubs -XX:-UseRVC -version > [0.010s][info][stubs] StubRoutines (initial stubs) [0x0000003f8f3cf340, 0x0000003f8f3d1cd0] used: 604, free: 10036 > [0.117s][info][stubs] StubRoutines (continuation stubs) [0x0000003f8f3d25c0, 0x0000003f8f3d3010] used: 628, free: 2012 > [0.153s][info][stubs] StubRoutines (final stubs) [0x0000003f8f4025c0, 0x0000003f8f409d70] used: 9380, free: 21260 > [0.199s][info][stubs] StubRoutines (compiler stubs) [0x0000003f8f4d7c40, 0x0000003f8f4e3180] used: 38924, free: 7476 > > > (PS: Same issue also triggers when building without ZGC (`--disable-jvm-feature-zgc`)) This pull request has now been integrated. Changeset: 2989d873 Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/2989d8734c70e1db87d2a708719fd2d966903a93 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8343805: RISC-V: JVM crashes on startup when disabling compressed instructions Reviewed-by: mli ------------- PR: https://git.openjdk.org/jdk/pull/21966 From chagedorn at openjdk.org Tue Nov 12 15:33:53 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 12 Nov 2024 15:33:53 GMT Subject: RFR: 8342047: Create Template Assertion Predicates with Halt nodes only instead of uncommon traps Message-ID: <3YvrlrUxn0ytaz6AzcgpYP5LqmGuOucwR_1tEpyrX8A=.2af5bcd3-1870-48d9-a336-d06783035d93@github.com> This patch replaces the creation of Template Assertion Predicates with uncommon traps with Halt nodes. ### Goal of Assertion Predicates #### Initialized Assertion Predicates These predicates ensure that control is properly folded when data is dying. They are **always true by design** and thus can never fail at runtime. We therefore put a halt node on the failing path. #### Template Assertion Predicates Only serve as templates to create Initialized Assertion Predicates from. They are never executed and are always removed after loop opts are over. Conceptionally, it does not matter whether the failing path uses an UCT or a halt node (or something else completely - I plan to have a separate "no-op" `TemplateAssertionPredicateNode` at some point which only falls through to the next node and does not have a failing path at all). ### Why Did we Use UCTs for Template Assertion Predicates? When the concept of Assertion Predicates was first introduced, it only covered a few edge cases. It was quite straight forward to reuse existing Loop Predication code which creates new predicates from a Parse Predicate by copying it and merging the UCTs on the failing paths with a region node. This was done with `PhaseIdealLoop::create_new_if_for_predicate()`. ### Why Do we Need to Use Halt Nodes for Template Assertion Predicates? #### Missing UCTs for Predicates above Loops Over time, we found more cases where we need to create Initialized Assertion Predicates from templates - including locations where we do not have Parse Predicates (and thus no safepoints). For example, when peeling one iteration off a loop with Parse Predicates, they will be kept at the peeled iteration and the remaining loop does not have any Parse Predicates anymore. #### Missing UCTs to Create Template Assertion Predicates Whenever we split a loop with Template Assertion Predicates, we also need to ensure that they are copied to all split loop versions. Since they rely on using UCTs, we also need to make sure that an UCT/safepoint is available to be used. However, this is not always the case (for example, after peeling an iteration off as described in the last section). As a result, we cannot easily establish new Template Assertion Predicates anywhere. One could think about faking an UCT or doing other special logic. But this seems rather fragile and could introduce quite some complexity - especially since we conceptionally don't even need to use UCTs at all for Template Assertion Predicates. There is already some special logic for a main loop, where we create Template Assertion Predicates with a Halt node because there is no UCT available for the main loop. But this logic and implementation it is not easily reusable and we would need to keep supporting both formats with UCTs and halt nodes. ### Solution: Assertion Predicates with Halt Nodes only As a simple solution to the problems described above, I propose to get rid of UCTs completely. This not only enables us to fix the remaining unresolved bugs where Assertion Predicates are missing but also simplifies the logic and the IR itself. I've added some comments in the PR to better explain the refactoring steps. Thanks, Christian ------------- Commit messages: - 8342047: Create Template Assertion Predicates with Halt nodes only instead of uncommon traps Changes: https://git.openjdk.org/jdk/pull/22040/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22040&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342047 Stats: 205 lines in 6 files changed: 30 ins; 77 del; 98 mod Patch: https://git.openjdk.org/jdk/pull/22040.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22040/head:pull/22040 PR: https://git.openjdk.org/jdk/pull/22040 From duke at openjdk.org Tue Nov 12 15:36:49 2024 From: duke at openjdk.org (Abdelhak Zaaim) Date: Tue, 12 Nov 2024 15:36:49 GMT Subject: RFR: 8343944: C2: MinLNode::add_ring() computes _widen wrongly leading to an endless widening/compilation In-Reply-To: <0RLXKFCTyWLfTDOcGzEd8PdDIzXMhD2VvFOSFNWUOM4=.70b9d267-127e-44bf-9382-9c4856ee5edf@github.com> References: <0RLXKFCTyWLfTDOcGzEd8PdDIzXMhD2VvFOSFNWUOM4=.70b9d267-127e-44bf-9382-9c4856ee5edf@github.com> Message-ID: On Tue, 12 Nov 2024 10:07:24 GMT, Christian Hagedorn wrote: > In `MinLNode::add_ring()`, we wrongly take the minimum of the `_widen`of both input types instead of the maximum which leads to an endless widening in CCP without reaching a fixed point with the test case. We eventually hit the memlimit because we keep creating new types endlessly. > > The fix is straight forward to use `MAX2()` instead of `MIN2()` as we are already doing for `MinINode::add_ring()`: > https://github.com/openjdk/jdk/blob/b53ee053f7f7ffcf02ff47e1895ce7be4bc32486/src/hotspot/share/opto/addnode.cpp#L1437-L1443 > > Details about how this endless widening is happening are provided as comments in the test case. > > Thanks, > Christian Marked as reviewed by abdelhak-zaaim at github.com (no known OpenJDK username). ------------- PR Review: https://git.openjdk.org/jdk/pull/22033#pullrequestreview-2429948361 From duke at openjdk.org Tue Nov 12 15:37:50 2024 From: duke at openjdk.org (Abdelhak Zaaim) Date: Tue, 12 Nov 2024 15:37:50 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps [v2] In-Reply-To: <61U1yfefGGXjAMffpUIz5nGvsmWUxoAGoOkY8hN1mHs=.a7ae0994-4c0d-42c1-afb5-f719630ddb61@github.com> References: <61U1yfefGGXjAMffpUIz5nGvsmWUxoAGoOkY8hN1mHs=.a7ae0994-4c0d-42c1-afb5-f719630ddb61@github.com> Message-ID: On Tue, 12 Nov 2024 11:55:09 GMT, Roberto Casta?eda Lozano wrote: >> This changeset dumps C2's low-level intermediate representation at the following intermediate register allocation points: >> - Initial liveness: after initial liveness information is computed. >> - Aggressive coalescing: after aggressively coalescing live ranges and destructing SSA. >> - Initial spilling: after initial round of spilling derived from physical interference graph construction. >> - Conservative coalescing: after each round of conservative (colorability-preserving) coalescing (if `OptoCoalesce` is enabled). >> - Iterative spilling: after each round of spilling. >> - After iterative spilling: after the main register allocation loop. >> - Post-allocation copy removal: after peephole copy removal. >> - Merge multiple definitions: after local merging of equivalent nodes related by the same live range. >> - Fix up spills: convert load-store spills into memory operand accesses ("CISC spilling") if allowed by the target platform and `UseCISCSpill` is enabled. >> >> The new dumps have already proved to be useful in the investigation of [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295). >> >> #### Testing >> >> - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64). >> - Tested automatically that dumping, scheduling, and viewing hundreds of the new graphs does not trigger any failure on HotSpot or IGV (by instrumenting IGV to schedule and view graphs eagerly and running `java -Xbatch -XX:PrintIdealGraphLevel=4`). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Split FIXUP_SPILLS Marked as reviewed by abdelhak-zaaim at github.com (no known OpenJDK username). ------------- PR Review: https://git.openjdk.org/jdk/pull/22017#pullrequestreview-2429956837 From dlunden at openjdk.org Tue Nov 12 15:46:30 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 12 Nov 2024 15:46:30 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps [v2] In-Reply-To: <61U1yfefGGXjAMffpUIz5nGvsmWUxoAGoOkY8hN1mHs=.a7ae0994-4c0d-42c1-afb5-f719630ddb61@github.com> References: <61U1yfefGGXjAMffpUIz5nGvsmWUxoAGoOkY8hN1mHs=.a7ae0994-4c0d-42c1-afb5-f719630ddb61@github.com> Message-ID: On Tue, 12 Nov 2024 11:55:09 GMT, Roberto Casta?eda Lozano wrote: >> This changeset dumps C2's low-level intermediate representation at the following intermediate register allocation points: >> - Initial liveness: after initial liveness information is computed. >> - Aggressive coalescing: after aggressively coalescing live ranges and destructing SSA. >> - Initial spilling: after initial round of spilling derived from physical interference graph construction. >> - Conservative coalescing: after each round of conservative (colorability-preserving) coalescing (if `OptoCoalesce` is enabled). >> - Iterative spilling: after each round of spilling. >> - After iterative spilling: after the main register allocation loop. >> - Post-allocation copy removal: after peephole copy removal. >> - Merge multiple definitions: after local merging of equivalent nodes related by the same live range. >> - Fix up spills: convert load-store spills into memory operand accesses ("CISC spilling") if allowed by the target platform and `UseCISCSpill` is enabled. >> >> The new dumps have already proved to be useful in the investigation of [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295). >> >> #### Testing >> >> - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64). >> - Tested automatically that dumping, scheduling, and viewing hundreds of the new graphs does not trigger any failure on HotSpot or IGV (by instrumenting IGV to schedule and view graphs eagerly and running `java -Xbatch -XX:PrintIdealGraphLevel=4`). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Split FIXUP_SPILLS Marked as reviewed by dlunden (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22017#pullrequestreview-2429976564 From swen at openjdk.org Tue Nov 12 16:49:02 2024 From: swen at openjdk.org (Shaojin Wen) Date: Tue, 12 Nov 2024 16:49:02 GMT Subject: RFR: 8343629: More MergeStore benchmark [v3] In-Reply-To: References: Message-ID: > 1. Added the putBytes4 benchmark, which corresponds to StringBuilder appendNull > 2. Optimized the putChars4/setInt/setLong series of benchmarks to reduce extra overhead and more accurately reflect performance differences. Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision: Update test/micro/org/openjdk/bench/vm/compiler/MergeStoreBench.java Co-authored-by: Emanuel Peter ---------